Artificial Intelligence Overview
From the origins of AI to cutting-edge agentic systems — a structured guide through the ideas, mathematics, algorithms, and engineering that define modern AI.
What Is Artificial Intelligence?
Artificial Intelligence is the science of building systems that can perceive, reason, learn, and act — performing tasks that traditionally required human intelligence.
- AI = systems that perceive, reason, learn, and act
- We are in the ANI era — narrow AI excels at specific tasks; AGI remains a research horizon
- AI progress is driven by the data × compute × algorithms flywheel
- Transformers (2017) and ChatGPT (2022) mark the two biggest inflection points in modern AI
Mathematical Foundations of AI
All of AI rests on a mathematical foundation — linear algebra (vectors, matrices), calculus (optimization), and probability & statistics (reasoning under uncertainty).
Linear Algebra
- Vectors & matrices — the language of data
- Matrix multiplication — core of neural nets
- Eigenvalues, SVD — dimensionality reduction
- Dot products — similarity & attention scores
Calculus & Optimization
- Partial derivatives — sensitivity of loss
- Gradient descent — the engine of learning
- Chain rule — the heart of backprop
- Adam, RMSProp — adaptive optimizers
Probability & Stats
- Probability distributions — model outputs
- Bayes' theorem — belief updating
- MLE & MAP — parameter estimation
- KL divergence — comparing distributions
- Linear algebra — every neural network is matrix multiplications + activations
- Gradient descent — iteratively moves parameters in the direction that reduces loss
- Backpropagation — uses the chain rule to compute gradients for every layer
- Probability — model outputs are distributions; Bayes' theorem underpins many algorithms
Machine Learning Essentials
Machine learning is the practice of building systems that learn patterns from data — rather than being explicitly programmed with rules. The model improves as it sees more examples.
| Algorithm | Type | When to Use | Limitation |
|---|---|---|---|
| Linear / Logistic Regression | Supervised | Baseline, interpretable, fast | Can't capture non-linear patterns |
| Decision Tree / Random Forest | Supervised | Tabular data, feature importance | Can overfit; forests = less interpretable |
| Gradient Boosting (XGBoost) | Supervised | Best accuracy on tabular data | Slow to train; many hyperparameters |
| k-Means Clustering | Unsupervised | Grouping unlabeled data | Requires k upfront; sensitive to init |
| PCA | Unsupervised | Dimensionality reduction, visualization | Linear only; components not interpretable |
| Q-Learning / DQN | Reinforcement | Discrete action spaces, games | Doesn't scale to continuous actions well |
- Supervised — most common; requires labeled data; powers classification & regression
- Unsupervised — finds hidden structure; clustering, dimensionality reduction
- Reinforcement — agent learns via reward; powers game AI & robotics
- Gradient Boosting (XGBoost) dominates tabular data; neural nets dominate unstructured data
Deep Learning & Neural Networks
Deep learning uses multi-layer neural networks to automatically learn hierarchical representations from raw data — powering vision, language, speech, and generative AI.
| Model | Year | Parameters | Training Data | Key Milestone |
|---|---|---|---|---|
| GPT-1 | 2018 | 117M | BooksCorpus | First large-scale pre-trained LM |
| BERT | 2018 | 340M | Wikipedia + Books | Bidirectional encoding; NLP SOTA |
| GPT-2 | 2019 | 1.5B | WebText (40GB) | "Too dangerous to release" — coherent long text |
| GPT-3 | 2020 | 175B | Common Crawl (570GB) | Few-shot learning emerges at scale |
| ChatGPT (GPT-3.5) | 2022 | ~175B | + RLHF fine-tuning | 100M users in 60 days |
| GPT-4 | 2023 | ~1T (est.) | Multimodal + RLHF | Passes bar exam, multimodal reasoning |
| Claude 3.5 / Gemini 1.5 | 2024 | Undisclosed | Long context (1M+) | Million-token context windows |
- Neural networks = stacked linear transformations + nonlinear activations
- Transformers use self-attention — every token can attend to every other token in O(n²)
- Scale law: more parameters + more data + more compute → better models (reliably)
- LLMs are pre-trained on internet text then aligned with RLHF to be helpful & safe
Agentic AI & Autonomous Systems
Agentic AI moves beyond question-answering — LLM-powered agents that plan, use tools, maintain memory, and act autonomously to complete multi-step goals in the real world.
| Framework | Focus | Best For | Key Feature |
|---|---|---|---|
| LangChain | Chains & agents | RAG, tool-use, pipelines | Huge ecosystem; LCEL chains |
| LangGraph | Stateful agent graphs | Complex multi-step agents | Cyclic graphs, controllable state |
| AutoGen (Microsoft) | Multi-agent conversations | Code generation & review | Agents debate/collaborate via messages |
| CrewAI | Role-based agents | Task delegation to specialists | Crew + Roles + Tasks abstraction |
| LlamaIndex | RAG & data indexing | Knowledge-grounded agents | Advanced retrieval pipelines |
| OpenAI Assistants API | Managed agents | Production agents with tools | Built-in memory, code interpreter |
Agent Architecture Patterns
- ReAct — Reason + Act + Observe loop
- Plan-and-Execute — plan full task first, then execute steps
- Reflexion — agent critiques its own outputs
- Tree of Thoughts — explore multiple reasoning branches
Multi-Agent Patterns
- Orchestrator + Workers — manager delegates to specialists
- Peer Review — agents critique each other's outputs
- Swarm — many simple agents, emergent behaviour
- Human-in-the-Loop — human approves critical decisions
- Agents = LLMs with tools + memory + planning — they act, not just answer
- ReAct: reason → act → observe → repeat until goal is achieved
- Multi-agent systems split complex tasks between specialized sub-agents
- Key challenge: reliability & evaluation — agents can hallucinate actions
What Is AI?
- Perceive, reason, learn, act
- ANI today — AGI is the horizon
- Data × Compute × Algorithms
- Transformers changed everything (2017)
Mathematical Foundations
- Linear algebra — data representation
- Gradient descent — how models learn
- Backpropagation — chain rule at scale
- Probability — reasoning under uncertainty
Machine Learning
- Supervised — learn from labeled data
- Unsupervised — discover structure
- Reinforcement — learn from consequences
- XGBoost for tabular; DL for unstructured
Neural Networks & LLMs
- Stacked layers → hierarchical features
- Transformers — attention is all you need
- Scale laws — bigger = better (reliably)
- RLHF — aligning LLMs with human intent
Agentic AI
- Tools + memory + planning loop
- ReAct: reason → act → observe
- Multi-agent: orchestrator + specialists
- The frontier of AI applications (2026)