Other AI/ML
Managed Services
AWS offers a suite of fully managed AI services that let you add intelligence β vision, language, speech β to your applications without building or training ML models. Each service is API-driven, pay-per-use, and scales automatically.
Amazon Rekognition β Image & Video Analysis
Amazon Rekognition is a fully managed computer vision service that analyses images and videos using deep learning. No ML expertise required β just call the API with an image or video, and Rekognition returns structured results: detected objects, faces, text, activities, and more.
Facial Analysis
- Detect faces in images/video
- Age range, gender, emotions
- Face comparison (1:1 match)
- Face search against a collection
Object & Scene Detection
- Label detection (car, dog, treeβ¦)
- Scene context (beach, office, city)
- Custom labels (train your own)
- Bounding boxes with confidence
Text & Content
- Text-in-image detection (signs, plates)
- Celebrity recognition
- Content moderation (unsafe images)
- PPE detection (helmets, masks)
Enterprise & Security
- Identity verification β compare selfie to ID photo
- Building access β face-based door entry
- Content moderation β auto-flag inappropriate user uploads
- Workplace safety β detect PPE compliance on construction sites
Media & Retail
- Media tagging β auto-tag scenes, celebrities in video archives
- Visual search β find similar products from a photo
- Smart cameras β real-time people counting, path tracking
- License plate reading β parking management, toll systems
Rekognition Image β synchronous, single-image analysis (returns in seconds). Rekognition Video β asynchronous, processes stored videos (S3) or live streams (Kinesis Video Streams). Video analysis uses SNS to notify when results are ready.
Rekognition is a fully managed computer vision service. It detects objects, faces, text, celebrities, and unsafe content in images/video with a simple API call. Common uses: identity verification, content moderation, PPE detection, and media tagging. No ML expertise needed β deep learning models are pre-trained and continuously improved by AWS.
Amazon Textract β Document OCR & Data Extraction
Amazon Textract goes beyond simple OCR. It uses ML to automatically extract text, tables, and forms (key-value pairs) from scanned documents β PDFs, images, handwritten text β without manual template configuration or rules.
| Feature | Traditional OCR | Amazon Textract |
|---|---|---|
| Text extraction | β Plain text only | β Text + structure |
| Table extraction | β No | β Rows, columns, cells preserved |
| Form extraction | β No | β Key-value pairs (Name: John) |
| Handwriting | Limited | β Supported |
| Templates required | Yes, per document type | No β ML understands layout |
| New document formats | New rules for each | Works out of the box |
Financial Services
- Invoice processing β extract vendor, amount, line items automatically
- Mortgage applications β pull data from tax forms, pay stubs
- Insurance claims β digitise and route handwritten claim forms
- Receipt processing β expense management automation
Healthcare & Government
- Patient intake forms β extract name, DOB, insurance info
- Government records β digitise legacy paper archives
- ID document processing β passports, driver's licences
- Compliance archival β searchable digital document stores
AnalyzeExpense β purpose-built for invoices and receipts. AnalyzeID β purpose-built for identity documents (passports, driver's licences). These specialised APIs return richer, domain-specific fields than the generic AnalyzeDocument API.
Textract extracts text, tables, and key-value pairs from documents using ML β no templates or rules needed. It handles PDFs, images, and handwriting. Specialised APIs exist for invoices (AnalyzeExpense) and identity documents (AnalyzeID). Common uses: invoice processing, mortgage automation, form digitisation, and compliance archival.
Amazon Comprehend β NLP & Sentiment Analysis
Amazon Comprehend is a Natural Language Processing (NLP) service that uses ML to find insights and relationships in text. It can detect the language, extract key phrases, identify entities (people, places, organisations), determine sentiment, and classify documents β all via API.
Sentiment Analysis
- Positive, Negative, Neutral, Mixed
- Confidence scores per sentiment
- Batch analysis for large datasets
- Real-time or async processing
Entity Recognition
- People, locations, organisations
- Dates, quantities, events
- Custom entity types (train your own)
- PII detection (SSN, email, phone)
Classification & Topics
- Key phrase extraction
- Language detection (100+ languages)
- Topic modelling (group documents)
- Custom classifiers (your categories)
Customer Experience
- Support ticket routing β classify tickets by topic, urgency, sentiment
- Product reviews β aggregate sentiment across thousands of reviews
- Social media monitoring β track brand sentiment in real time
- Voice of customer β extract themes from survey responses
Compliance & Security
- PII detection & redaction β find SSNs, emails, phone numbers in documents
- Comprehend Medical β extract medical entities (conditions, medications, dosages)
- Document classification β auto-categorise legal or regulatory documents
- Risk assessment β analyse tone in financial filings
A HIPAA-eligible variant that extracts medical information: conditions, medications, dosages, tests, procedures, and protected health information (PHI). Purpose-built for healthcare workflows.
Comprehend provides NLP as a service β sentiment analysis, entity recognition, key phrase extraction, language detection, PII detection, and custom classification. Use it to route support tickets, monitor brand sentiment, redact PII, and classify documents. Comprehend Medical is a specialised HIPAA-eligible variant for healthcare text.
Amazon Lex β Conversational AI & Chatbots
Amazon Lex is the service behind Alexa's conversational engine. It provides automatic speech recognition (ASR) to convert speech to text, and natural language understanding (NLU) to recognise the intent of the text β enabling you to build conversational chatbots and voice bots.
| Concept | What It Is | Example |
|---|---|---|
| Intent | The action the user wants to perform | BookHotel, OrderPizza, CheckBalance |
| Utterance | Sample phrases that trigger an intent | "I want to book a hotel", "Reserve a room" |
| Slot | Parameters needed to fulfil the intent | City, CheckInDate, NumberOfNights |
| Fulfilment | The backend action that executes | Lambda function, API call, return a message |
Banking
Check account balance, transfer funds, report lost cards β via voice or chat, integrated with Amazon Connect contact centres.
E-Commerce
Order tracking, product recommendations, return requests. Handles natural language: "Where's my order?" or "I want to return the shoes."
IT Helpdesk
Password resets, ticket creation, FAQ answers. Deflects common queries from human agents, reducing support costs by 30-50%.
Lex is commonly paired with Amazon Connect (cloud contact centre) to build intelligent IVR systems. Customers speak naturally instead of pressing "1 for billing, 2 for supportβ¦". Lex understands intent from speech and routes accordingly.
Lex provides the same conversational AI that powers Alexa. It combines ASR (speech to text) with NLU (intent recognition) to build chatbots and voice bots. Key concepts: intents, utterances, slots, fulfilment (usually Lambda). Integrates with Amazon Connect for voice, and Slack/Messenger for chat. Common uses: banking bots, e-commerce support, IT helpdesks.
Amazon Polly β Text to Speech
Amazon Polly turns text into lifelike speech. It supports dozens of languages and voices β including Neural TTS voices that sound remarkably human. You send text, Polly returns an audio stream (MP3, OGG, PCM) that you can play or store.
| Voice Type | Quality | Use Case |
|---|---|---|
| Standard | Good quality, concatenative TTS | IVR prompts, basic narration, low-cost batch |
| Neural | Near-human quality, deep learning | Audiobooks, podcasts, customer-facing apps |
| Long-Form | Optimised for long passages (neural) | Full articles, e-books, news reading |
| Generative | Most expressive, conversational tone | Interactive dialogue, virtual assistants |
Content & Accessibility
- Audiobook generation β convert e-books to spoken audio automatically
- News reading β Washington Post uses Polly to narrate articles
- Accessibility β voice output for visually impaired users
- E-learning β narrate training modules and courses
Communications & IoT
- IVR systems β dynamic voice prompts in call centres (via Connect)
- IoT announcements β smart devices speaking status updates
- Gaming β dynamic NPC dialogue without recording voice actors
- Multilingual apps β same content spoken in different languages
Polly supports SSML (Speech Synthesis Markup Language) for precise control: add pauses (<break>), whisper (<amazon:effect name="whispered">), emphasise words, control speed/pitch, and switch languages mid-sentence. Neural voices + SSML = near-human narration.
Polly converts text to lifelike speech in dozens of languages. Neural voices deliver near-human quality; Standard voices are cost-effective for simpler uses. SSML provides fine-grained control over pronunciation, pauses, and expression. Common uses: audiobooks, e-learning narration, IVR prompts, accessibility features, and IoT voice output.
Amazon Transcribe β Speech to Text
Amazon Transcribe is an automatic speech recognition (ASR) service that converts audio to text. It supports batch processing (files in S3) and real-time streaming transcription. It handles multiple speakers, custom vocabularies, and automatic punctuation.
Core Transcription
- Batch (S3 audio files)
- Real-time streaming
- Auto punctuation & casing
- Timestamps per word
Speaker Features
- Speaker diarisation (who said what)
- Channel identification (multi-channel)
- Custom vocabulary (jargon, names)
- Vocabulary filters (redact words)
Compliance
- PII redaction in transcripts
- Content redaction (audio)
- Language identification
- Subtitles output (SRT/VTT)
Contact Centres & Meetings
- Call analytics β transcribe + analyse customer calls for compliance and quality
- Meeting transcription β auto-generate meeting notes with speaker labels
- Live captions β real-time subtitles for webinars, broadcasting
- Agent assist β real-time transcript for AI-powered suggestions
Media & Content
- Subtitle generation β auto-generate SRT/VTT for video content
- Podcast indexing β make audio content searchable by text
- Lecture transcription β accessibility for educational content
- Content moderation β detect spoken profanity or PII
A specialised API built for contact centres: transcribes calls, identifies sentiment per turn, detects issues, flags call categories (complaints, cancellations), and provides a turn-by-turn sentiment graph. Pairs perfectly with Amazon Connect.
Transcribe converts speech to text via batch or real-time streaming. It supports speaker diarisation, custom vocabularies, PII redaction, and subtitle generation (SRT/VTT). Call Analytics is a specialised variant for contact centres. Common uses: meeting notes, call transcription, subtitle generation, podcast indexing, and live captions. Feeds naturally into Comprehend or Translate for further processing.
Amazon Translate β Language Translation
Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. It supports 75+ languages and allows you to localise content, translate user-generated text in real time, or batch-translate entire document repositories.
Translation Modes
- Real-time (synchronous API)
- Batch translation (async, S3)
- 75+ language pairs
- Auto source language detection
Customisation
- Custom terminology (brand names, jargon)
- Parallel data (custom training data)
- Formality control (formal/informal)
- Profanity masking
Document Support
- Plain text, HTML, DOCX
- Preserves formatting/tags
- Batch translate entire S3 folders
- Integrates with other AWS AI services
Localisation & Communication
- Website localisation β translate web content to reach global audiences
- Chat translation β real-time multilingual customer support
- Email translation β auto-translate incoming support emails
- User-generated content β translate reviews, comments, posts
AI Pipeline Integration
- Transcribe β Translate β transcribe a call, then translate the transcript
- Comprehend β Translate β analyse sentiment, then translate the summary
- Translate β Polly β translate text, then speak it in the target language
- Document migration β batch-translate legacy content archives
Use custom terminology files (CSV/TMX) to ensure brand names, product names, and domain jargon are translated consistently. Example: "EC2" should remain "EC2" in every language, not be translated literally. Custom terminology overrides the neural model's default translation for specific terms.
Translate provides neural machine translation across 75+ languages β real-time or batch. Custom terminology ensures brand/jargon consistency. Formality control adapts tone. It integrates naturally into AI pipelines: Transcribe β Translate β Polly for end-to-end multilingual voice workflows. Common uses: website localisation, multilingual chat, content migration, and global customer support.
AI/ML Managed Services β Complete Domain Summary
- Rekognition β image & video analysis: face detection, object labelling, text-in-image, content moderation, PPE detection. No ML expertise needed.
- Textract β document OCR that extracts text, tables, and key-value pairs. Specialised APIs for invoices (AnalyzeExpense) and IDs (AnalyzeID). No templates required.
- Comprehend β NLP service for sentiment analysis, entity recognition, key phrases, PII detection, and custom classification. Comprehend Medical for healthcare text.
- Lex β conversational AI (powers Alexa). ASR + NLU for chatbots and voice bots. Concepts: intents, utterances, slots. Pairs with Amazon Connect for voice.
- Polly β text-to-speech with Neural and Standard voices in 60+ languages. SSML for fine-grained control. Uses: audiobooks, IVR, accessibility, e-learning.
- Transcribe β speech-to-text via batch or streaming. Speaker diarisation, PII redaction, subtitle output. Call Analytics for contact centres.
- Translate β neural machine translation for 75+ languages. Custom terminology for brand consistency. Real-time or batch. Integrates with Transcribe, Comprehend, Polly.