LearningTree Β· AWS Β· AI & Machine Learning

Other AI/ML
Managed Services

AWS offers a suite of fully managed AI services that let you add intelligence β€” vision, language, speech β€” to your applications without building or training ML models. Each service is API-driven, pay-per-use, and scales automatically.

01
Chapter One Β· AI/ML

Amazon Rekognition β€” Image & Video Analysis

Amazon Rekognition is a fully managed computer vision service that analyses images and videos using deep learning. No ML expertise required β€” just call the API with an image or video, and Rekognition returns structured results: detected objects, faces, text, activities, and more.

What Rekognition Can Do Introductory
πŸ‘€

Facial Analysis

  • Detect faces in images/video
  • Age range, gender, emotions
  • Face comparison (1:1 match)
  • Face search against a collection
🏷️

Object & Scene Detection

  • Label detection (car, dog, tree…)
  • Scene context (beach, office, city)
  • Custom labels (train your own)
  • Bounding boxes with confidence
πŸ“

Text & Content

  • Text-in-image detection (signs, plates)
  • Celebrity recognition
  • Content moderation (unsafe images)
  • PPE detection (helmets, masks)
How It Works Core
Amazon Rekognition β€” Request Flow
Image / Video S3 or bytes Amazon Rekognition DetectLabels DetectFaces DetectText JSON Response Labels + Confidence Bounding Boxes Face Attributes Your App
Real-World Applications Core
🏒

Enterprise & Security

  • Identity verification β€” compare selfie to ID photo
  • Building access β€” face-based door entry
  • Content moderation β€” auto-flag inappropriate user uploads
  • Workplace safety β€” detect PPE compliance on construction sites
🎬

Media & Retail

  • Media tagging β€” auto-tag scenes, celebrities in video archives
  • Visual search β€” find similar products from a photo
  • Smart cameras β€” real-time people counting, path tracking
  • License plate reading β€” parking management, toll systems
πŸ’‘ Rekognition Image vs Video

Rekognition Image β€” synchronous, single-image analysis (returns in seconds). Rekognition Video β€” asynchronous, processes stored videos (S3) or live streams (Kinesis Video Streams). Video analysis uses SNS to notify when results are ready.

Chapter 01 β€” Key Takeaway

Rekognition is a fully managed computer vision service. It detects objects, faces, text, celebrities, and unsafe content in images/video with a simple API call. Common uses: identity verification, content moderation, PPE detection, and media tagging. No ML expertise needed β€” deep learning models are pre-trained and continuously improved by AWS.

02
Chapter Two Β· AI/ML

Amazon Textract β€” Document OCR & Data Extraction

Amazon Textract goes beyond simple OCR. It uses ML to automatically extract text, tables, and forms (key-value pairs) from scanned documents β€” PDFs, images, handwritten text β€” without manual template configuration or rules.

Textract vs Traditional OCR Introductory
FeatureTraditional OCRAmazon Textract
Text extractionβœ… Plain text onlyβœ… Text + structure
Table extraction❌ Noβœ… Rows, columns, cells preserved
Form extraction❌ Noβœ… Key-value pairs (Name: John)
HandwritingLimitedβœ… Supported
Templates requiredYes, per document typeNo β€” ML understands layout
New document formatsNew rules for eachWorks out of the box
How It Works Core
Amazon Textract β€” Document Processing Pipeline
Document PDF / Image Scanned / Photo Handwritten Amazon Textract DetectDocumentText AnalyzeDocument (Forms) AnalyzeDocument (Tables) Extracted Data πŸ“ Raw text lines πŸ“‹ Key-value pairs πŸ“Š Table structures πŸ“ Bounding boxes Downstream Database Search index Workflow
Real-World Applications Core
🏦

Financial Services

  • Invoice processing β€” extract vendor, amount, line items automatically
  • Mortgage applications β€” pull data from tax forms, pay stubs
  • Insurance claims β€” digitise and route handwritten claim forms
  • Receipt processing β€” expense management automation
πŸ₯

Healthcare & Government

  • Patient intake forms β€” extract name, DOB, insurance info
  • Government records β€” digitise legacy paper archives
  • ID document processing β€” passports, driver's licences
  • Compliance archival β€” searchable digital document stores
ℹ️ Textract Specialised APIs

AnalyzeExpense β€” purpose-built for invoices and receipts. AnalyzeID β€” purpose-built for identity documents (passports, driver's licences). These specialised APIs return richer, domain-specific fields than the generic AnalyzeDocument API.

Chapter 02 β€” Key Takeaway

Textract extracts text, tables, and key-value pairs from documents using ML β€” no templates or rules needed. It handles PDFs, images, and handwriting. Specialised APIs exist for invoices (AnalyzeExpense) and identity documents (AnalyzeID). Common uses: invoice processing, mortgage automation, form digitisation, and compliance archival.

03
Chapter Three Β· AI/ML

Amazon Comprehend β€” NLP & Sentiment Analysis

Amazon Comprehend is a Natural Language Processing (NLP) service that uses ML to find insights and relationships in text. It can detect the language, extract key phrases, identify entities (people, places, organisations), determine sentiment, and classify documents β€” all via API.

Comprehend Capabilities Introductory
😊

Sentiment Analysis

  • Positive, Negative, Neutral, Mixed
  • Confidence scores per sentiment
  • Batch analysis for large datasets
  • Real-time or async processing
🏷️

Entity Recognition

  • People, locations, organisations
  • Dates, quantities, events
  • Custom entity types (train your own)
  • PII detection (SSN, email, phone)
πŸ“‚

Classification & Topics

  • Key phrase extraction
  • Language detection (100+ languages)
  • Topic modelling (group documents)
  • Custom classifiers (your categories)
How It Works Core
Amazon Comprehend β€” NLP Pipeline
Text Input Reviews, tickets, emails, articles Amazon Comprehend β†’ Sentiment detection β†’ Entity extraction β†’ Key phrases β†’ Language & PII Insights Sentiment: POSITIVE 0.92 Entity: "AWS" β†’ ORG PII: email@test.com Action Route, tag, alert, redact
Real-World Applications Core
🎧

Customer Experience

  • Support ticket routing β€” classify tickets by topic, urgency, sentiment
  • Product reviews β€” aggregate sentiment across thousands of reviews
  • Social media monitoring β€” track brand sentiment in real time
  • Voice of customer β€” extract themes from survey responses
πŸ”’

Compliance & Security

  • PII detection & redaction β€” find SSNs, emails, phone numbers in documents
  • Comprehend Medical β€” extract medical entities (conditions, medications, dosages)
  • Document classification β€” auto-categorise legal or regulatory documents
  • Risk assessment β€” analyse tone in financial filings
βœ… Comprehend Medical

A HIPAA-eligible variant that extracts medical information: conditions, medications, dosages, tests, procedures, and protected health information (PHI). Purpose-built for healthcare workflows.

Chapter 03 β€” Key Takeaway

Comprehend provides NLP as a service β€” sentiment analysis, entity recognition, key phrase extraction, language detection, PII detection, and custom classification. Use it to route support tickets, monitor brand sentiment, redact PII, and classify documents. Comprehend Medical is a specialised HIPAA-eligible variant for healthcare text.

04
Chapter Four Β· AI/ML

Amazon Lex β€” Conversational AI & Chatbots

Amazon Lex is the service behind Alexa's conversational engine. It provides automatic speech recognition (ASR) to convert speech to text, and natural language understanding (NLU) to recognise the intent of the text β€” enabling you to build conversational chatbots and voice bots.

Core Concepts Introductory
ConceptWhat It IsExample
IntentThe action the user wants to performBookHotel, OrderPizza, CheckBalance
UtteranceSample phrases that trigger an intent"I want to book a hotel", "Reserve a room"
SlotParameters needed to fulfil the intentCity, CheckInDate, NumberOfNights
FulfilmentThe backend action that executesLambda function, API call, return a message
How It Works Core
Amazon Lex β€” Conversation Flow
User Text or Voice Amazon Lex 1. ASR (speech→text) 2. NLU (intent+slots) 3. Slot elicitation 4. Confirmation 5. Fulfilment AWS Lambda Validate slots Call backend APIs Return response Response to User Text or synthesised voice Channels: Website · Mobile app · Slack · Facebook Messenger · Amazon Connect · Twilio
Real-World Applications Core
🏦

Banking

Check account balance, transfer funds, report lost cards β€” via voice or chat, integrated with Amazon Connect contact centres.

πŸ›’

E-Commerce

Order tracking, product recommendations, return requests. Handles natural language: "Where's my order?" or "I want to return the shoes."

🏒

IT Helpdesk

Password resets, ticket creation, FAQ answers. Deflects common queries from human agents, reducing support costs by 30-50%.

ℹ️ Lex + Amazon Connect

Lex is commonly paired with Amazon Connect (cloud contact centre) to build intelligent IVR systems. Customers speak naturally instead of pressing "1 for billing, 2 for support…". Lex understands intent from speech and routes accordingly.

Chapter 04 β€” Key Takeaway

Lex provides the same conversational AI that powers Alexa. It combines ASR (speech to text) with NLU (intent recognition) to build chatbots and voice bots. Key concepts: intents, utterances, slots, fulfilment (usually Lambda). Integrates with Amazon Connect for voice, and Slack/Messenger for chat. Common uses: banking bots, e-commerce support, IT helpdesks.

05
Chapter Five Β· AI/ML

Amazon Polly β€” Text to Speech

Amazon Polly turns text into lifelike speech. It supports dozens of languages and voices β€” including Neural TTS voices that sound remarkably human. You send text, Polly returns an audio stream (MP3, OGG, PCM) that you can play or store.

Voice Types Introductory
Voice TypeQualityUse Case
StandardGood quality, concatenative TTSIVR prompts, basic narration, low-cost batch
NeuralNear-human quality, deep learningAudiobooks, podcasts, customer-facing apps
Long-FormOptimised for long passages (neural)Full articles, e-books, news reading
GenerativeMost expressive, conversational toneInteractive dialogue, virtual assistants
How It Works Core
Amazon Polly β€” Text-to-Speech Pipeline
Input Text Plain text or SSML (Speech Synthesis ML) Amazon Polly Select voice + language Neural / Standard engine Audio Output MP3 / OGG / PCM Stream or S3 file Playback App / IVR / IoT
Real-World Applications Core
πŸ“š

Content & Accessibility

  • Audiobook generation β€” convert e-books to spoken audio automatically
  • News reading β€” Washington Post uses Polly to narrate articles
  • Accessibility β€” voice output for visually impaired users
  • E-learning β€” narrate training modules and courses
πŸ“ž

Communications & IoT

  • IVR systems β€” dynamic voice prompts in call centres (via Connect)
  • IoT announcements β€” smart devices speaking status updates
  • Gaming β€” dynamic NPC dialogue without recording voice actors
  • Multilingual apps β€” same content spoken in different languages
πŸ’‘ SSML β€” Fine-Grained Control

Polly supports SSML (Speech Synthesis Markup Language) for precise control: add pauses (<break>), whisper (<amazon:effect name="whispered">), emphasise words, control speed/pitch, and switch languages mid-sentence. Neural voices + SSML = near-human narration.

Chapter 05 β€” Key Takeaway

Polly converts text to lifelike speech in dozens of languages. Neural voices deliver near-human quality; Standard voices are cost-effective for simpler uses. SSML provides fine-grained control over pronunciation, pauses, and expression. Common uses: audiobooks, e-learning narration, IVR prompts, accessibility features, and IoT voice output.

06
Chapter Six Β· AI/ML

Amazon Transcribe β€” Speech to Text

Amazon Transcribe is an automatic speech recognition (ASR) service that converts audio to text. It supports batch processing (files in S3) and real-time streaming transcription. It handles multiple speakers, custom vocabularies, and automatic punctuation.

Key Features Introductory
πŸŽ™οΈ

Core Transcription

  • Batch (S3 audio files)
  • Real-time streaming
  • Auto punctuation & casing
  • Timestamps per word
πŸ‘₯

Speaker Features

  • Speaker diarisation (who said what)
  • Channel identification (multi-channel)
  • Custom vocabulary (jargon, names)
  • Vocabulary filters (redact words)
πŸ”’

Compliance

  • PII redaction in transcripts
  • Content redaction (audio)
  • Language identification
  • Subtitles output (SRT/VTT)
How It Works Core
Amazon Transcribe β€” Speech-to-Text Pipeline
Audio S3 file or live stream Amazon Transcribe ASR + punctuation Speaker diarisation PII redaction Transcript JSON with timestamps Speaker labels SRT/VTT subtitles Downstream β†’ Comprehend β†’ Translate β†’ Search index
Real-World Applications Core
πŸ“ž

Contact Centres & Meetings

  • Call analytics β€” transcribe + analyse customer calls for compliance and quality
  • Meeting transcription β€” auto-generate meeting notes with speaker labels
  • Live captions β€” real-time subtitles for webinars, broadcasting
  • Agent assist β€” real-time transcript for AI-powered suggestions
πŸŽ₯

Media & Content

  • Subtitle generation β€” auto-generate SRT/VTT for video content
  • Podcast indexing β€” make audio content searchable by text
  • Lecture transcription β€” accessibility for educational content
  • Content moderation β€” detect spoken profanity or PII
ℹ️ Transcribe Call Analytics

A specialised API built for contact centres: transcribes calls, identifies sentiment per turn, detects issues, flags call categories (complaints, cancellations), and provides a turn-by-turn sentiment graph. Pairs perfectly with Amazon Connect.

Chapter 06 β€” Key Takeaway

Transcribe converts speech to text via batch or real-time streaming. It supports speaker diarisation, custom vocabularies, PII redaction, and subtitle generation (SRT/VTT). Call Analytics is a specialised variant for contact centres. Common uses: meeting notes, call transcription, subtitle generation, podcast indexing, and live captions. Feeds naturally into Comprehend or Translate for further processing.

07
Chapter Seven Β· AI/ML

Amazon Translate β€” Language Translation

Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. It supports 75+ languages and allows you to localise content, translate user-generated text in real time, or batch-translate entire document repositories.

Key Features Introductory
🌍

Translation Modes

  • Real-time (synchronous API)
  • Batch translation (async, S3)
  • 75+ language pairs
  • Auto source language detection
πŸŽ›οΈ

Customisation

  • Custom terminology (brand names, jargon)
  • Parallel data (custom training data)
  • Formality control (formal/informal)
  • Profanity masking
πŸ“„

Document Support

  • Plain text, HTML, DOCX
  • Preserves formatting/tags
  • Batch translate entire S3 folders
  • Integrates with other AWS AI services
How It Works Core
Amazon Translate β€” Translation Flow
Source Text "Bonjour le monde" Auto-detect: French Amazon Translate Neural MT engine + Custom terminology Target Text "Hello world" Target: English Your App Website / Chat
Real-World Applications Core
🌐

Localisation & Communication

  • Website localisation β€” translate web content to reach global audiences
  • Chat translation β€” real-time multilingual customer support
  • Email translation β€” auto-translate incoming support emails
  • User-generated content β€” translate reviews, comments, posts
πŸ”—

AI Pipeline Integration

  • Transcribe β†’ Translate β€” transcribe a call, then translate the transcript
  • Comprehend β†’ Translate β€” analyse sentiment, then translate the summary
  • Translate β†’ Polly β€” translate text, then speak it in the target language
  • Document migration β€” batch-translate legacy content archives
πŸ’‘ Custom Terminology

Use custom terminology files (CSV/TMX) to ensure brand names, product names, and domain jargon are translated consistently. Example: "EC2" should remain "EC2" in every language, not be translated literally. Custom terminology overrides the neural model's default translation for specific terms.

Chapter 07 β€” Key Takeaway

Translate provides neural machine translation across 75+ languages β€” real-time or batch. Custom terminology ensures brand/jargon consistency. Formality control adapts tone. It integrates naturally into AI pipelines: Transcribe β†’ Translate β†’ Polly for end-to-end multilingual voice workflows. Common uses: website localisation, multilingual chat, content migration, and global customer support.

AI/ML Managed Services β€” Complete Domain Summary

  • Rekognition β€” image & video analysis: face detection, object labelling, text-in-image, content moderation, PPE detection. No ML expertise needed.
  • Textract β€” document OCR that extracts text, tables, and key-value pairs. Specialised APIs for invoices (AnalyzeExpense) and IDs (AnalyzeID). No templates required.
  • Comprehend β€” NLP service for sentiment analysis, entity recognition, key phrases, PII detection, and custom classification. Comprehend Medical for healthcare text.
  • Lex β€” conversational AI (powers Alexa). ASR + NLU for chatbots and voice bots. Concepts: intents, utterances, slots. Pairs with Amazon Connect for voice.
  • Polly β€” text-to-speech with Neural and Standard voices in 60+ languages. SSML for fine-grained control. Uses: audiobooks, IVR, accessibility, e-learning.
  • Transcribe β€” speech-to-text via batch or streaming. Speaker diarisation, PII redaction, subtitle output. Call Analytics for contact centres.
  • Translate β€” neural machine translation for 75+ languages. Custom terminology for brand consistency. Real-time or batch. Integrates with Transcribe, Comprehend, Polly.
How These Services Work Together
πŸ“· Image/Video Input Rekognition Labels, Faces, Text πŸ“„ Documents PDF, scans Textract Text, Tables, Forms πŸŽ™οΈ Speech Audio, calls Transcribe Speech β†’ Text Comprehend NLP, Sentiment, Entities Translate 75+ Languages Polly β†’ πŸ”Š Speech Lex Chatbots & Voice Bots All services are API-driven, fully managed, pay-per-use, and scale automatically.