Artificial Intelligence Overview
From the origins of AI to cutting-edge agentic systems β a structured guide through the ideas, mathematics, algorithms, and engineering that define modern AI.
What Is Artificial Intelligence?
Artificial Intelligence is the science of building systems that can perceive, reason, learn, and act β performing tasks that traditionally required human intelligence.
Foundations of AI Systems
All of AI rests on a mathematical foundation β linear algebra (vectors, matrices), calculus (optimization), and probability & statistics (reasoning under uncertainty).
Linear Algebra
- Vectors & matrices β the language of data
- Matrix multiplication β core of neural nets
- Eigenvalues, SVD β dimensionality reduction
- Dot products β similarity & attention scores
Calculus & Optimization
- Partial derivatives β sensitivity of loss
- Gradient descent β the engine of learning
- Chain rule β the heart of backprop
- Adam, RMSProp β adaptive optimizers
Probability & Stats
- Probability distributions β model outputs
- Bayes' theorem β belief updating
- MLE & MAP β parameter estimation
- KL divergence β comparing distributions
Machine learning is the practice of building systems that learn patterns from data β rather than being explicitly programmed with rules. The model improves as it sees more examples.
| Algorithm | Type | When to Use | Limitation |
|---|---|---|---|
| Linear / Logistic Regression | Supervised | Baseline, interpretable, fast | Can't capture non-linear patterns |
| Decision Tree / Random Forest | Supervised | Tabular data, feature importance | Can overfit; forests = less interpretable |
| Gradient Boosting (XGBoost) | Supervised | Best accuracy on tabular data | Slow to train; many hyperparameters |
| k-Means Clustering | Unsupervised | Grouping unlabeled data | Requires k upfront; sensitive to init |
| PCA | Unsupervised | Dimensionality reduction, visualization | Linear only; components not interpretable |
| Q-Learning / DQN | Reinforcement | Discrete action spaces, games | Doesn't scale to continuous actions well |
Deep learning uses multi-layer neural networks to automatically learn hierarchical representations from raw data β powering vision, language, speech, and generative AI.
| Model | Year | Parameters | Training Data | Key Milestone |
|---|---|---|---|---|
| GPT-1 | 2018 | 117M | BooksCorpus | First large-scale pre-trained LM |
| BERT | 2018 | 340M | Wikipedia + Books | Bidirectional encoding; NLP SOTA |
| GPT-2 | 2019 | 1.5B | WebText (40GB) | "Too dangerous to release" β coherent long text |
| GPT-3 | 2020 | 175B | Common Crawl (570GB) | Few-shot learning emerges at scale |
| ChatGPT (GPT-3.5) | 2022 | ~175B | + RLHF fine-tuning | 100M users in 60 days |
| GPT-4 | 2023 | ~1T (est.) | Multimodal + RLHF | Passes bar exam, multimodal reasoning |
| Claude 3.5 / Gemini 1.5 | 2024 | Undisclosed | Long context (1M+) | Million-token context windows |
Agentic AI & Autonomous Systems
Agentic AI moves beyond question-answering β LLM-powered agents that plan, use tools, maintain memory, and act autonomously to complete multi-step goals in the real world.
| Framework | Focus | Best For | Key Feature |
|---|---|---|---|
| LangChain | Chains & agents | RAG, tool-use, pipelines | Huge ecosystem; LCEL chains |
| LangGraph | Stateful agent graphs | Complex multi-step agents | Cyclic graphs, controllable state |
| AutoGen (Microsoft) | Multi-agent conversations | Code generation & review | Agents debate/collaborate via messages |
| CrewAI | Role-based agents | Task delegation to specialists | Crew + Roles + Tasks abstraction |
| LlamaIndex | RAG & data indexing | Knowledge-grounded agents | Advanced retrieval pipelines |
| OpenAI Assistants API | Managed agents | Production agents with tools | Built-in memory, code interpreter |
Agent Architecture Patterns
- ReAct β Reason + Act + Observe loop
- Plan-and-Execute β plan full task first, then execute steps
- Reflexion β agent critiques its own outputs
- Tree of Thoughts β explore multiple reasoning branches
Multi-Agent Patterns
- Orchestrator + Workers β manager delegates to specialists
- Peer Review β agents critique each other's outputs
- Swarm β many simple agents, emergent behaviour
- Human-in-the-Loop β human approves critical decisions
Who Makes the Major LLMs?
A handful of labs and companies now produce the world's most capable language models β each with a different philosophy, access model, and primary strength. Here is the current landscape as of 2025β2026.
| Model | Company | Context | Access | Primary Strengths |
|---|---|---|---|---|
| GPT-4o | OpenAI | 128K | API / ChatGPT | Multimodal (vision+audio+text), strong reasoning, huge ecosystem |
| o3 / o4-mini | OpenAI | 200K | API / ChatGPT | Extended "thinking" reasoning; excels at math, science, coding benchmarks |
| Claude 3.5 Sonnet | Anthropic | 200K | API / Claude.ai | Best-in-class coding & instruction following, long-context analysis, safety |
| Claude 3 Opus | Anthropic | 200K | API / Claude.ai | Highest intelligence in Claude 3 family; complex analysis & nuanced writing |
| Gemini 2.0 Flash | Google DeepMind | 1M | API / AI Studio | Fastest Gemini; native multimodal; 1M-token context; Google Search integration |
| Gemini 2.0 Pro | Google DeepMind | 2M | API / Gemini app | Largest context window available; deep Google Workspace & Search integration |
| Llama 3.3 70B | 128K | Open weights | Best open-weight model at 70B scale; self-hostable; commercial use allowed | |
| Llama 4 Scout / Maverick | 10M | Open weights | MoE architecture; Scout: 10M ctx; Maverick: balanced performance/cost | |
| Grok-3 | xAI | 131K | X Premium / API | Real-time X/Twitter data access; strong math & reasoning; "Think" mode |
| DeepSeek-V3 | DeepSeek | 128K | Open weights | MoE, 671B params (37B active); top coding & math; trained at fraction of cost |
| DeepSeek-R1 | DeepSeek | 128K | Open weights | Chain-of-thought reasoning model; matches o1 on many benchmarks at much lower cost |
| Mistral Large 2 | Mistral AI | 128K | Open weights | European flagship; strong multilingual, coding; deployable on-premises |
| Codestral | Mistral AI | 32K | API | Specialized code model; fill-in-the-middle; 80+ programming languages |
| Phi-4 | Microsoft | 16K | Open weights | Small (14B) but punches above weight; strong STEM reasoning; edge-deployable |
| Gemma 3 | Google DeepMind | 128K | Open weights | Lightweight open model family (1Bβ27B); runs on consumer hardware; multimodal |
Open-Weight Models
- Weights are publicly downloadable (Llama, DeepSeek, Mistral, Gemma, Phi)
- Self-host on your own hardware β full data privacy
- Fine-tune for specific domains at low cost
- No per-token API costs; ideal for high-volume or offline use
- Tradeoff: often smaller or less capable than frontier closed models
Closed / Proprietary Models
- Weights not public β access via API only (GPT-4o, Claude, Gemini)
- Frontier performance: best-in-class on most benchmarks
- Managed infrastructure β no hardware required
- Pay per token; rate-limited; dependent on provider uptime
- Tradeoff: data leaves your premises; less control
AI in the World β Industry Applications
AI is no longer a research curiosity β it is being deployed across every major industry. Here is how AI is transforming the world you live and work in.
- AlphaFold 2 solved protein folding β 200M+ structures predicted
- AI detects cancer in radiology scans (sometimes better than doctors)
- LLMs draft clinical notes and summarize patient histories
- Drug discovery pipelines accelerated from years to months
- Robotic surgery assistance (da Vinci + AI guidance)
- GitHub Copilot writes ~40% of code in projects that use it
- Claude / GPT-4 explain, debug, and refactor entire codebases
- Automated test generation and code review
- Natural-language-to-SQL and natural-language-to-API
- AI agents autonomously resolve GitHub issues end-to-end
- Real-time fraud detection across billions of transactions daily
- Algorithmic trading and sentiment analysis of news/social media
- AI underwriting β instant loan risk assessment
- LLMs for regulatory document analysis and compliance checking
- Personalized financial advice (robo-advisors)
- AI tutors that adapt to each student's pace and learning style
- Automated grading and detailed feedback on essays
- Language learning apps (Duolingo's AI conversation practice)
- Content generation β quizzes, explanations, worked examples
- Accessibility tools: real-time captioning, text simplification
- Midjourney, DALL-E 3, Stable Diffusion β text-to-image generation
- Suno / Udio β generate full songs from a text prompt
- LLMs as writing assistants, copywriting, and content at scale
- Video generation: Sora, Runway, Kling β text-to-video
- Game asset generation and NPC dialogue trees
- Literature review and hypothesis generation from 100K+ papers
- Climate modeling and materials science simulation
- AI co-pilots in genomics and bioinformatics pipelines
- Mathematics: AI assists in finding novel proofs (FunSearch)
- Particle physics: AI pattern recognition in collider data
Essential AI Glossary
AI has a dense vocabulary. Here are the terms you will encounter most often β explained plainly, without jargon.
How to Learn AI β A Practical Roadmap
AI is learnable by anyone willing to invest the time. Here is a realistic, structured path from curious beginner β capable practitioner β regardless of your current background.
- Use ChatGPT, Claude, Gemini daily β experiment
- Watch 3Blue1Brown's "Neural Networks" series
- Read: "AI Superpowers" (Kai-Fu Lee)
- Play with Teachable Machine by Google
- Python: variables, loops, functions, classes
- NumPy, Pandas β data manipulation
- Matplotlib / Seaborn β visualization
- Fast.ai "Practical Deep Learning" Part 1
- Andrew Ng's ML Specialization (Coursera)
- Scikit-learn: regression, classification, clustering
- Train your first model on a Kaggle dataset
- Understand: train/val/test splits, overfitting, metrics
- Deep Learning Specialization β Andrew Ng
- PyTorch or TensorFlow β build neural nets
- HuggingFace: load, fine-tune, deploy models
- Read "Attention Is All You Need" paper
- Build a RAG chatbot over your own documents
- Create an AI agent with LangChain / LangGraph
- Fine-tune a Llama or Mistral model
- Participate in Kaggle competitions
- Pick a domain: CV, NLP, RL, MLOps, AI Safety
- Follow: Andrej Karpathy, Yann LeCun, papers.cool
- Read Anthropic, OpenAI, DeepMind research blogs
- Contribute to open-source AI projects
| Resource | Type | Best For | Link / Platform |
|---|---|---|---|
| fast.ai β Practical Deep Learning | Course (free) | Hands-on DL, top-down approach | fast.ai |
| Andrew Ng's ML Specialization | Course (audit free) | ML fundamentals, theory + practice | Coursera |
| Andrej Karpathy β Neural Networks: Zero to Hero | YouTube (free) | Build transformers from scratch | YouTube |
| HuggingFace Courses | Course (free) | NLP, diffusion, RL with Transformers | huggingface.co/learn |
| 3Blue1Brown β Neural Networks | YouTube (free) | Visual intuition for how NNs work | YouTube |
| Kaggle Learn + Competitions | Practice (free) | Hands-on ML with real data | kaggle.com/learn |
| Arxiv Sanity / papers.cool | Research papers | Staying up-to-date with AI research | arxiv.org |