LearningTree · AWS · AI & Machine Learning

Other AI/ML
Managed Services

AWS offers a suite of fully managed AI services that let you add intelligence — vision, language, speech — to your applications without building or training ML models. Each service is API-driven, pay-per-use, and scales automatically.

Chapter One · AI/ML

Amazon Rekognition — Image & Video Analysis

Amazon Rekognition is a fully managed computer vision service that analyses images and videos using deep learning. No ML expertise required — just call the API with an image or video, and Rekognition returns structured results: detected objects, faces, text, activities, and more.

What Rekognition Can Do Introductory

👤

Facial Analysis

Detect faces in images/video
Age range, gender, emotions
Face comparison (1:1 match)
Face search against a collection

🏷️

Object & Scene Detection

Label detection (car, dog, tree…)
Scene context (beach, office, city)
Custom labels (train your own)
Bounding boxes with confidence

📝

Text & Content

Text-in-image detection (signs, plates)
Celebrity recognition
Content moderation (unsafe images)
PPE detection (helmets, masks)

How It Works Core

Amazon Rekognition — Request Flow

Real-World Applications Core

🏢

Enterprise & Security

Identity verification — compare selfie to ID photo
Building access — face-based door entry
Content moderation — auto-flag inappropriate user uploads
Workplace safety — detect PPE compliance on construction sites

🎬

Media & Retail

Media tagging — auto-tag scenes, celebrities in video archives
Visual search — find similar products from a photo
Smart cameras — real-time people counting, path tracking
License plate reading — parking management, toll systems

💡 Rekognition Image vs Video

Rekognition Image — synchronous, single-image analysis (returns in seconds). Rekognition Video — asynchronous, processes stored videos (S3) or live streams (Kinesis Video Streams). Video analysis uses SNS to notify when results are ready.

Chapter 01 — Key Takeaway

Rekognition is a fully managed computer vision service. It detects objects, faces, text, celebrities, and unsafe content in images/video with a simple API call. Common uses: identity verification, content moderation, PPE detection, and media tagging. No ML expertise needed — deep learning models are pre-trained and continuously improved by AWS.

Chapter Two · AI/ML

Amazon Textract — Document OCR & Data Extraction

Amazon Textract goes beyond simple OCR. It uses ML to automatically extract text, tables, and forms (key-value pairs) from scanned documents — PDFs, images, handwritten text — without manual template configuration or rules.

Textract vs Traditional OCR Introductory

Feature	Traditional OCR	Amazon Textract
Text extraction	✅ Plain text only	✅ Text + structure
Table extraction	❌ No	✅ Rows, columns, cells preserved
Form extraction	❌ No	✅ Key-value pairs (Name: John)
Handwriting	Limited	✅ Supported
Templates required	Yes, per document type	No — ML understands layout
New document formats	New rules for each	Works out of the box

How It Works Core

Amazon Textract — Document Processing Pipeline

Real-World Applications Core

🏦

Financial Services

Invoice processing — extract vendor, amount, line items automatically
Mortgage applications — pull data from tax forms, pay stubs
Insurance claims — digitise and route handwritten claim forms
Receipt processing — expense management automation

🏥

Healthcare & Government

Patient intake forms — extract name, DOB, insurance info
Government records — digitise legacy paper archives
ID document processing — passports, driver's licences
Compliance archival — searchable digital document stores

ℹ️ Textract Specialised APIs

AnalyzeExpense — purpose-built for invoices and receipts. AnalyzeID — purpose-built for identity documents (passports, driver's licences). These specialised APIs return richer, domain-specific fields than the generic AnalyzeDocument API.

Chapter 02 — Key Takeaway

Textract extracts text, tables, and key-value pairs from documents using ML — no templates or rules needed. It handles PDFs, images, and handwriting. Specialised APIs exist for invoices (AnalyzeExpense) and identity documents (AnalyzeID). Common uses: invoice processing, mortgage automation, form digitisation, and compliance archival.

Chapter Three · AI/ML

Amazon Comprehend — NLP & Sentiment Analysis

Amazon Comprehend is a Natural Language Processing (NLP) service that uses ML to find insights and relationships in text. It can detect the language, extract key phrases, identify entities (people, places, organisations), determine sentiment, and classify documents — all via API.

Comprehend Capabilities Introductory

😊

Sentiment Analysis

Positive, Negative, Neutral, Mixed
Confidence scores per sentiment
Batch analysis for large datasets
Real-time or async processing

🏷️

Entity Recognition

People, locations, organisations
Dates, quantities, events
Custom entity types (train your own)
PII detection (SSN, email, phone)

📂

Classification & Topics

Key phrase extraction
Language detection (100+ languages)
Topic modelling (group documents)
Custom classifiers (your categories)

How It Works Core

Amazon Comprehend — NLP Pipeline

Real-World Applications Core

🎧

Customer Experience

Support ticket routing — classify tickets by topic, urgency, sentiment
Product reviews — aggregate sentiment across thousands of reviews
Social media monitoring — track brand sentiment in real time
Voice of customer — extract themes from survey responses

🔒

Compliance & Security

PII detection & redaction — find SSNs, emails, phone numbers in documents
Comprehend Medical — extract medical entities (conditions, medications, dosages)
Document classification — auto-categorise legal or regulatory documents
Risk assessment — analyse tone in financial filings

✅ Comprehend Medical

A HIPAA-eligible variant that extracts medical information: conditions, medications, dosages, tests, procedures, and protected health information (PHI). Purpose-built for healthcare workflows.

Chapter 03 — Key Takeaway

Comprehend provides NLP as a service — sentiment analysis, entity recognition, key phrase extraction, language detection, PII detection, and custom classification. Use it to route support tickets, monitor brand sentiment, redact PII, and classify documents. Comprehend Medical is a specialised HIPAA-eligible variant for healthcare text.

Chapter Four · AI/ML

Amazon Lex — Conversational AI & Chatbots

Amazon Lex is the service behind Alexa's conversational engine. It provides automatic speech recognition (ASR) to convert speech to text, and natural language understanding (NLU) to recognise the intent of the text — enabling you to build conversational chatbots and voice bots.

Core Concepts Introductory

Concept	What It Is	Example
Intent	The action the user wants to perform	`BookHotel`, `OrderPizza`, `CheckBalance`
Utterance	Sample phrases that trigger an intent	"I want to book a hotel", "Reserve a room"
Slot	Parameters needed to fulfil the intent	City, CheckInDate, NumberOfNights
Fulfilment	The backend action that executes	Lambda function, API call, return a message

How It Works Core

Amazon Lex — Conversation Flow

Real-World Applications Core

🏦

Banking

Check account balance, transfer funds, report lost cards — via voice or chat, integrated with Amazon Connect contact centres.

🛒

E-Commerce

Order tracking, product recommendations, return requests. Handles natural language: "Where's my order?" or "I want to return the shoes."

🏢

IT Helpdesk

Password resets, ticket creation, FAQ answers. Deflects common queries from human agents, reducing support costs by 30-50%.

ℹ️ Lex + Amazon Connect

Lex is commonly paired with Amazon Connect (cloud contact centre) to build intelligent IVR systems. Customers speak naturally instead of pressing "1 for billing, 2 for support…". Lex understands intent from speech and routes accordingly.

Chapter 04 — Key Takeaway

Lex provides the same conversational AI that powers Alexa. It combines ASR (speech to text) with NLU (intent recognition) to build chatbots and voice bots. Key concepts: intents, utterances, slots, fulfilment (usually Lambda). Integrates with Amazon Connect for voice, and Slack/Messenger for chat. Common uses: banking bots, e-commerce support, IT helpdesks.

Chapter Five · AI/ML

Amazon Polly — Text to Speech

Amazon Polly turns text into lifelike speech. It supports dozens of languages and voices — including Neural TTS voices that sound remarkably human. You send text, Polly returns an audio stream (MP3, OGG, PCM) that you can play or store.

Voice Types Introductory

Voice Type	Quality	Use Case
Standard	Good quality, concatenative TTS	IVR prompts, basic narration, low-cost batch
Neural	Near-human quality, deep learning	Audiobooks, podcasts, customer-facing apps
Long-Form	Optimised for long passages (neural)	Full articles, e-books, news reading
Generative	Most expressive, conversational tone	Interactive dialogue, virtual assistants

How It Works Core

Amazon Polly — Text-to-Speech Pipeline

Real-World Applications Core

📚

Content & Accessibility

Audiobook generation — convert e-books to spoken audio automatically
News reading — Washington Post uses Polly to narrate articles
Accessibility — voice output for visually impaired users
E-learning — narrate training modules and courses

📞

Communications & IoT

IVR systems — dynamic voice prompts in call centres (via Connect)
IoT announcements — smart devices speaking status updates
Gaming — dynamic NPC dialogue without recording voice actors
Multilingual apps — same content spoken in different languages

💡 SSML — Fine-Grained Control

Polly supports SSML (Speech Synthesis Markup Language) for precise control: add pauses (<break>), whisper (<amazon:effect name="whispered">), emphasise words, control speed/pitch, and switch languages mid-sentence. Neural voices + SSML = near-human narration.

Chapter 05 — Key Takeaway

Polly converts text to lifelike speech in dozens of languages. Neural voices deliver near-human quality; Standard voices are cost-effective for simpler uses. SSML provides fine-grained control over pronunciation, pauses, and expression. Common uses: audiobooks, e-learning narration, IVR prompts, accessibility features, and IoT voice output.

Chapter Six · AI/ML

Amazon Transcribe — Speech to Text

Amazon Transcribe is an automatic speech recognition (ASR) service that converts audio to text. It supports batch processing (files in S3) and real-time streaming transcription. It handles multiple speakers, custom vocabularies, and automatic punctuation.

Key Features Introductory

🎙️

Core Transcription

Batch (S3 audio files)
Real-time streaming
Auto punctuation & casing
Timestamps per word

👥

Speaker Features

Speaker diarisation (who said what)
Channel identification (multi-channel)
Custom vocabulary (jargon, names)
Vocabulary filters (redact words)

🔒

Compliance

PII redaction in transcripts
Content redaction (audio)
Language identification
Subtitles output (SRT/VTT)

How It Works Core

Amazon Transcribe — Speech-to-Text Pipeline

Real-World Applications Core

📞

Contact Centres & Meetings

Call analytics — transcribe + analyse customer calls for compliance and quality
Meeting transcription — auto-generate meeting notes with speaker labels
Live captions — real-time subtitles for webinars, broadcasting
Agent assist — real-time transcript for AI-powered suggestions

🎥

Media & Content

Subtitle generation — auto-generate SRT/VTT for video content
Podcast indexing — make audio content searchable by text
Lecture transcription — accessibility for educational content
Content moderation — detect spoken profanity or PII

ℹ️ Transcribe Call Analytics

A specialised API built for contact centres: transcribes calls, identifies sentiment per turn, detects issues, flags call categories (complaints, cancellations), and provides a turn-by-turn sentiment graph. Pairs perfectly with Amazon Connect.

Chapter 06 — Key Takeaway

Transcribe converts speech to text via batch or real-time streaming. It supports speaker diarisation, custom vocabularies, PII redaction, and subtitle generation (SRT/VTT). Call Analytics is a specialised variant for contact centres. Common uses: meeting notes, call transcription, subtitle generation, podcast indexing, and live captions. Feeds naturally into Comprehend or Translate for further processing.

Chapter Seven · AI/ML

Amazon Translate — Language Translation

Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. It supports 75+ languages and allows you to localise content, translate user-generated text in real time, or batch-translate entire document repositories.

Key Features Introductory

🌍

Translation Modes

Real-time (synchronous API)
Batch translation (async, S3)
75+ language pairs
Auto source language detection

🎛️

Customisation

Custom terminology (brand names, jargon)
Parallel data (custom training data)
Formality control (formal/informal)
Profanity masking

📄

Document Support

Plain text, HTML, DOCX
Preserves formatting/tags
Batch translate entire S3 folders
Integrates with other AWS AI services

How It Works Core

Amazon Translate — Translation Flow

Real-World Applications Core

🌐

Localisation & Communication

Website localisation — translate web content to reach global audiences
Chat translation — real-time multilingual customer support
Email translation — auto-translate incoming support emails
User-generated content — translate reviews, comments, posts

🔗

AI Pipeline Integration

Transcribe → Translate — transcribe a call, then translate the transcript
Comprehend → Translate — analyse sentiment, then translate the summary
Translate → Polly — translate text, then speak it in the target language
Document migration — batch-translate legacy content archives

💡 Custom Terminology

Use custom terminology files (CSV/TMX) to ensure brand names, product names, and domain jargon are translated consistently. Example: "EC2" should remain "EC2" in every language, not be translated literally. Custom terminology overrides the neural model's default translation for specific terms.

Chapter 07 — Key Takeaway

Translate provides neural machine translation across 75+ languages — real-time or batch. Custom terminology ensures brand/jargon consistency. Formality control adapts tone. It integrates naturally into AI pipelines: Transcribe → Translate → Polly for end-to-end multilingual voice workflows. Common uses: website localisation, multilingual chat, content migration, and global customer support.

 AI/ML Managed Services — Complete Domain Summary Rekognition — image & video analysis: face detection, object labelling, text-in-image, content moderation, PPE detection. No ML expertise needed.
Textract — document OCR that extracts text, tables, and key-value pairs. Specialised APIs for invoices (AnalyzeExpense) and IDs (AnalyzeID). No templates required.
Comprehend — NLP service for sentiment analysis, entity recognition, key phrases, PII detection, and custom classification. Comprehend Medical for healthcare text.
Lex — conversational AI (powers Alexa). ASR + NLU for chatbots and voice bots. Concepts: intents, utterances, slots. Pairs with Amazon Connect for voice.
Polly — text-to-speech with Neural and Standard voices in 60+ languages. SSML for fine-grained control. Uses: audiobooks, IVR, accessibility, e-learning.
Transcribe — speech-to-text via batch or streaming. Speaker diarisation, PII redaction, subtitle output. Call Analytics for contact centres.
Translate — neural machine translation for 75+ languages. Custom terminology for brand consistency. Real-time or batch. Integrates with Transcribe, Comprehend, Polly.
 

How These Services Work Together