LearningTree

Artificial Intelligence Overview

From the origins of AI to cutting-edge agentic systems β€” a structured guide through the ideas, mathematics, algorithms, and engineering that define modern AI.

01
Chapter One Β· Introduction

What Is Artificial Intelligence?

Artificial Intelligence is the science of building systems that can perceive, reason, learn, and act β€” performing tasks that traditionally required human intelligence.

The AI Spectrum β€” From Narrow to General
ANI Narrow AI Today: GPT, AlphaGo, FaceID, recommenders AGI General AI Human-level reasoning Active research frontier ASI Super AI Exceeds human cognition Theoretical / long horizon ← Specialised Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β·Β· Generalised β†’ βœ“ We are here β†’ ANI era. AGI timeline debated (5–20+ years by leading researchers)
A Brief History of AI
1950
Turing Test & Birth of AI
Alan Turing asks "Can machines think?" β€” proposes the Imitation Game. McCarthy coins "Artificial Intelligence" at the 1956 Dartmouth Conference.
1956–1974
The Golden Age
Early optimism β€” checkers-playing programs, ELIZA chatbot, symbolic reasoning systems. Perceptron invented by Rosenblatt (1957).
1974–1980
First AI Winter
Funding cuts as early AI hit scalability limits. Minsky & Papert's critique of perceptrons stalls neural network research.
1980–1987
Expert Systems Boom
Rule-based systems (MYCIN, XCON) deployed in industry. Backpropagation re-discovered (Rumelhart & Hinton, 1986).
1987–1993
Second AI Winter
Expert systems brittle and expensive to maintain. AI hardware market collapses. Neural network research stalls again.
1997–2011
ML Renaissance
Deep Blue beats Kasparov (1997). SVMs, kernel methods, probabilistic graphical models mature. Web-scale data emerges.
2012
Deep Learning Revolution
AlexNet wins ImageNet by a huge margin using GPU-trained CNNs. Deep learning era begins β€” LeCun, Bengio, Hinton (Turing Award 2018).
2017
"Attention Is All You Need"
Google Brain introduces the Transformer architecture β€” replaces RNNs, becomes the foundation of all modern LLMs.
2020–2023
LLM Era β€” GPT-3 to ChatGPT
OpenAI GPT-3 (175B params). GitHub Copilot. ChatGPT (100M users in 60 days). GPT-4, Claude, Gemini. DALL-E, Stable Diffusion, Midjourney.
2024–2026
Agentic AI Frontier
LLM agents with tools, memory, and planning. Multi-agent frameworks (LangChain, CrewAI, AutoGen). Reasoning models (o1, o3, DeepSeek-R1). Multimodal foundation models.
The Three Pillars of AI Progress
Why AI accelerated β€” Data Γ— Compute Γ— Algorithms
πŸ“Š Data ImageNet Β· Common Crawl Web-scale text & images RLHF preference data Fuel for learning ⚑ Compute GPU / TPU acceleration NVIDIA H100 clusters Cloud-scale distributed training Engine of scale 🧬 Algorithms Backpropagation Attention / Transformers RLHF Β· LoRA Β· MoE Intelligence architecture Γ— Γ—
Deep dive β†’ AI Fundamentals: History, Types & Core Concepts
02
Chapter Two Β· Core Pillars

Foundations of AI Systems

Mathematical Foundations

All of AI rests on a mathematical foundation β€” linear algebra (vectors, matrices), calculus (optimization), and probability & statistics (reasoning under uncertainty).

πŸ“

Linear Algebra

  • Vectors & matrices β€” the language of data
  • Matrix multiplication β€” core of neural nets
  • Eigenvalues, SVD β€” dimensionality reduction
  • Dot products β€” similarity & attention scores
βˆ‚

Calculus & Optimization

  • Partial derivatives β€” sensitivity of loss
  • Gradient descent β€” the engine of learning
  • Chain rule β€” the heart of backprop
  • Adam, RMSProp β€” adaptive optimizers
🎲

Probability & Stats

  • Probability distributions β€” model outputs
  • Bayes' theorem β€” belief updating
  • MLE & MAP β€” parameter estimation
  • KL divergence β€” comparing distributions
Deep dive β†’ Mathematical Foundations of AI
Machine Learning Essentials

Machine learning is the practice of building systems that learn patterns from data β€” rather than being explicitly programmed with rules. The model improves as it sees more examples.

The Three ML Paradigms
Supervised Input β†’ Label pairs Image: 🐱 β†’ "Cat" βœ“ Classification Β· Regression Decision Trees Β· SVM Β· LR Most common in practice Unsupervised Find structure in raw data Clustering Β· PCA Β· Autoencoders k-means Β· DBSCAN Β· t-SNE Discover hidden patterns Reinforcement Agent learns via interaction Agent Env action reward Q-Learning Β· Policy Gradient AlphaGo Β· Game AI Β· Robotics Learn from consequences
Key ML Algorithms at a Glance
AlgorithmTypeWhen to UseLimitation
Linear / Logistic RegressionSupervisedBaseline, interpretable, fastCan't capture non-linear patterns
Decision Tree / Random ForestSupervisedTabular data, feature importanceCan overfit; forests = less interpretable
Gradient Boosting (XGBoost)SupervisedBest accuracy on tabular dataSlow to train; many hyperparameters
k-Means ClusteringUnsupervisedGrouping unlabeled dataRequires k upfront; sensitive to init
PCAUnsupervisedDimensionality reduction, visualizationLinear only; components not interpretable
Q-Learning / DQNReinforcementDiscrete action spaces, gamesDoesn't scale to continuous actions well
Deep dive β†’ ML Essentials: Algorithms, Evaluation & Feature Engineering
Deep Learning & Neural Networks

Deep learning uses multi-layer neural networks to automatically learn hierarchical representations from raw data β€” powering vision, language, speech, and generative AI.

Neural Network β€” Layers & Forward Pass
INPUT HIDDEN Β· 1 HIDDEN Β· 2 OUTPUT x₁, xβ‚‚, x₃ ReLU(Wx+b) ReLU(Wx+b) Softmax
The Transformer β€” Architecture Behind LLMs
Transformer Encoder Block β€” Self-Attention + FFN
Input Embeddings + Positional Encoding Multi-Head Self-Attention Q Β· K Β· V matrices β€” each token attends to all others Add & Layer Norm Feed-Forward Network (FFN) Linear β†’ ReLU β†’ Linear β€” applied to each token independently Add & Layer Norm β†’ Output Γ— N layers (e.g. 96 in GPT-4) K = Key Q = Query V = Value
LLM Scale β€” From GPT-1 to GPT-4
ModelYearParametersTraining DataKey Milestone
GPT-12018117MBooksCorpusFirst large-scale pre-trained LM
BERT2018340MWikipedia + BooksBidirectional encoding; NLP SOTA
GPT-220191.5BWebText (40GB)"Too dangerous to release" β€” coherent long text
GPT-32020175BCommon Crawl (570GB)Few-shot learning emerges at scale
ChatGPT (GPT-3.5)2022~175B+ RLHF fine-tuning100M users in 60 days
GPT-42023~1T (est.)Multimodal + RLHFPasses bar exam, multimodal reasoning
Claude 3.5 / Gemini 1.52024UndisclosedLong context (1M+)Million-token context windows
Deep dive β†’ Deep Learning: Neural Networks, CNNs, Transformers & LLMs
03
Chapter Three Β· Agentic AI

Agentic AI & Autonomous Systems

Agentic AI moves beyond question-answering β€” LLM-powered agents that plan, use tools, maintain memory, and act autonomously to complete multi-step goals in the real world.

LLM Agent Architecture β€” Core Components
LLM Brain Reasoning Β· Planning Β· Deciding πŸ”§ Tools Web search Β· Code exec APIs Β· File I/O Β· Calculator Database queries 🧠 Memory Short-term: context window Long-term: vector store Episodic: past actions πŸ“‹ Planning ReAct Β· Chain-of-Thought Tree-of-Thoughts Task decomposition 🌐 Environment Browser Β· OS Β· Cloud User interaction Other agents
The ReAct Pattern β€” Reasoning + Acting
ReAct Loop β€” How Agents Solve Multi-Step Tasks
Thought Reason about what to do next Action Call a tool or sub-agent Observation Receive tool output / result Evaluate Goal reached? If not β†’ repeat ← Loop until goal achieved or max steps reached
Major Agentic AI Frameworks
FrameworkFocusBest ForKey Feature
LangChainChains & agentsRAG, tool-use, pipelinesHuge ecosystem; LCEL chains
LangGraphStateful agent graphsComplex multi-step agentsCyclic graphs, controllable state
AutoGen (Microsoft)Multi-agent conversationsCode generation & reviewAgents debate/collaborate via messages
CrewAIRole-based agentsTask delegation to specialistsCrew + Roles + Tasks abstraction
LlamaIndexRAG & data indexingKnowledge-grounded agentsAdvanced retrieval pipelines
OpenAI Assistants APIManaged agentsProduction agents with toolsBuilt-in memory, code interpreter
🧩

Agent Architecture Patterns

  • ReAct β€” Reason + Act + Observe loop
  • Plan-and-Execute β€” plan full task first, then execute steps
  • Reflexion β€” agent critiques its own outputs
  • Tree of Thoughts β€” explore multiple reasoning branches
🀝

Multi-Agent Patterns

  • Orchestrator + Workers β€” manager delegates to specialists
  • Peer Review β€” agents critique each other's outputs
  • Swarm β€” many simple agents, emergent behaviour
  • Human-in-the-Loop β€” human approves critical decisions
Deep dive β†’ Agentic AI: Frameworks, Reasoning Patterns & Multi-Agent Systems
04
Chapter Four Β· The LLM Landscape

Who Makes the Major LLMs?

A handful of labs and companies now produce the world's most capable language models β€” each with a different philosophy, access model, and primary strength. Here is the current landscape as of 2025–2026.

The Major AI Labs at a Glance
🟒 OpenAI GPT-4o Β· o3 Β· o4-mini Sora Β· DALL-E 3 🟑 Anthropic Claude 3.5 Sonnet Claude 3 Opus / Haiku πŸ”΅ Google DeepMind Gemini 2.0 Β· Gemma PaLM Β· AlphaCode 2 πŸ¦™ Meta AI Llama 3.3 Β· Llama 4 Open weights xAI (Elon Musk) Grok-3 Β· Grok-2 DeepSeek (China) DeepSeek-V3 Β· R1 Mistral AI (France) Mistral Large 2 Β· Codestral Microsoft Phi-4 Β· Copilot (GPT-4o)
Current Major LLMs β€” Detailed Comparison (2025–2026)
Model Company Context Access Primary Strengths
GPT-4o OpenAI 128K API / ChatGPT Multimodal (vision+audio+text), strong reasoning, huge ecosystem
o3 / o4-mini OpenAI 200K API / ChatGPT Extended "thinking" reasoning; excels at math, science, coding benchmarks
Claude 3.5 Sonnet Anthropic 200K API / Claude.ai Best-in-class coding & instruction following, long-context analysis, safety
Claude 3 Opus Anthropic 200K API / Claude.ai Highest intelligence in Claude 3 family; complex analysis & nuanced writing
Gemini 2.0 Flash Google DeepMind 1M API / AI Studio Fastest Gemini; native multimodal; 1M-token context; Google Search integration
Gemini 2.0 Pro Google DeepMind 2M API / Gemini app Largest context window available; deep Google Workspace & Search integration
Llama 3.3 70B Meta AI 128K Open weights Best open-weight model at 70B scale; self-hostable; commercial use allowed
Llama 4 Scout / Maverick Meta AI 10M Open weights MoE architecture; Scout: 10M ctx; Maverick: balanced performance/cost
Grok-3 xAI 131K X Premium / API Real-time X/Twitter data access; strong math & reasoning; "Think" mode
DeepSeek-V3 DeepSeek 128K Open weights MoE, 671B params (37B active); top coding & math; trained at fraction of cost
DeepSeek-R1 DeepSeek 128K Open weights Chain-of-thought reasoning model; matches o1 on many benchmarks at much lower cost
Mistral Large 2 Mistral AI 128K Open weights European flagship; strong multilingual, coding; deployable on-premises
Codestral Mistral AI 32K API Specialized code model; fill-in-the-middle; 80+ programming languages
Phi-4 Microsoft 16K Open weights Small (14B) but punches above weight; strong STEM reasoning; edge-deployable
Gemma 3 Google DeepMind 128K Open weights Lightweight open model family (1B–27B); runs on consumer hardware; multimodal
Open-Weight vs Closed Models β€” What's the Difference?
πŸ”“

Open-Weight Models

  • Weights are publicly downloadable (Llama, DeepSeek, Mistral, Gemma, Phi)
  • Self-host on your own hardware β€” full data privacy
  • Fine-tune for specific domains at low cost
  • No per-token API costs; ideal for high-volume or offline use
  • Tradeoff: often smaller or less capable than frontier closed models
πŸ”’

Closed / Proprietary Models

  • Weights not public β€” access via API only (GPT-4o, Claude, Gemini)
  • Frontier performance: best-in-class on most benchmarks
  • Managed infrastructure β€” no hardware required
  • Pay per token; rate-limited; dependent on provider uptime
  • Tradeoff: data leaves your premises; less control
05
Chapter Five Β· Real-World Impact

AI in the World β€” Industry Applications

AI is no longer a research curiosity β€” it is being deployed across every major industry. Here is how AI is transforming the world you live and work in.

Healthcare & Medicine
πŸ₯ Diagnostics & Drug Discovery
  • AlphaFold 2 solved protein folding β€” 200M+ structures predicted
  • AI detects cancer in radiology scans (sometimes better than doctors)
  • LLMs draft clinical notes and summarize patient histories
  • Drug discovery pipelines accelerated from years to months
  • Robotic surgery assistance (da Vinci + AI guidance)
Software Engineering
πŸ’» AI-Assisted Development
  • GitHub Copilot writes ~40% of code in projects that use it
  • Claude / GPT-4 explain, debug, and refactor entire codebases
  • Automated test generation and code review
  • Natural-language-to-SQL and natural-language-to-API
  • AI agents autonomously resolve GitHub issues end-to-end
Finance & Banking
πŸ’³ Risk, Trading & Fraud
  • Real-time fraud detection across billions of transactions daily
  • Algorithmic trading and sentiment analysis of news/social media
  • AI underwriting β€” instant loan risk assessment
  • LLMs for regulatory document analysis and compliance checking
  • Personalized financial advice (robo-advisors)
Education
πŸ“š Personalized Learning
  • AI tutors that adapt to each student's pace and learning style
  • Automated grading and detailed feedback on essays
  • Language learning apps (Duolingo's AI conversation practice)
  • Content generation β€” quizzes, explanations, worked examples
  • Accessibility tools: real-time captioning, text simplification
Creative Industries
🎨 Art, Music & Writing
  • Midjourney, DALL-E 3, Stable Diffusion β€” text-to-image generation
  • Suno / Udio β€” generate full songs from a text prompt
  • LLMs as writing assistants, copywriting, and content at scale
  • Video generation: Sora, Runway, Kling β€” text-to-video
  • Game asset generation and NPC dialogue trees
Science & Research
πŸ”¬ Accelerating Discovery
  • Literature review and hypothesis generation from 100K+ papers
  • Climate modeling and materials science simulation
  • AI co-pilots in genomics and bioinformatics pipelines
  • Mathematics: AI assists in finding novel proofs (FunSearch)
  • Particle physics: AI pattern recognition in collider data
06
Chapter Six Β· Reference

Essential AI Glossary

AI has a dense vocabulary. Here are the terms you will encounter most often β€” explained plainly, without jargon.

Token
The basic unit LLMs process β€” roughly a word or word-piece. "ChatGPT" = 1 token. "unbelievable" β‰ˆ 3 tokens. Models have a maximum token limit (context window).
Context Window
The total amount of text an LLM can "see" at once β€” its working memory. GPT-3 had 4K tokens; Gemini 2.0 Pro has 2 million. Larger = can process entire books in one go.
Hallucination
When an AI confidently states something false. LLMs predict the most plausible next token β€” they don't verify facts. Always cross-check factual claims from AI.
Embedding
A list of numbers (vector) that captures the meaning of a word, sentence, or document. Similar concepts have similar vectors. Powers semantic search and RAG systems.
RAG (Retrieval-Augmented Generation)
A technique that gives an LLM access to external documents at query time. Instead of relying on training memory, the model retrieves relevant facts first, then generates. Reduces hallucinations.
Fine-Tuning
Further training a pre-trained model on a smaller domain-specific dataset to specialize it. Think of it as taking a general-purpose model and teaching it your company's jargon and style.
RLHF
Reinforcement Learning from Human Feedback β€” the training technique behind ChatGPT, Claude, and Gemini. Human raters score outputs; the model learns to produce higher-rated responses.
Prompt Engineering
The art of crafting inputs to get better outputs from LLMs. Techniques: few-shot examples, chain-of-thought ("think step by step"), system prompts, role assignments, and output format specification.
Temperature
A parameter controlling randomness in LLM outputs. Temperature 0 = deterministic, always picks the most likely token. Temperature 1 = creative and varied. High temp β†’ more surprising, less reliable.
Parameters / Weights
The numbers inside a neural network that encode all learned knowledge. GPT-4 has ~1 trillion parameters. More parameters generally means more capacity β€” but also more compute cost.
Mixture of Experts (MoE)
An architecture where a large model is divided into "expert" sub-networks, and only a subset activates per token. DeepSeek-V3 has 671B total params but only uses 37B per forward pass β€” efficient at scale.
Inference
Running a trained model to generate an output β€” as opposed to training. When you send a message to ChatGPT, the server is doing inference. Inference cost is what you pay for in API pricing.
Multimodal
A model that can process multiple types of data β€” text, images, audio, video. GPT-4o, Gemini, and Claude 3 are multimodal: they can read images, describe charts, and analyze PDFs.
Overfitting
When a model memorizes training data instead of generalizing. It performs great on training examples but fails on new data. Prevented by regularization, dropout, more diverse data.
Attention Mechanism
The core innovation in Transformers β€” lets the model weigh which parts of the input are most relevant to each output token. "The cat sat on the mat" β€” predicting "mat" attends strongly to "sat" and "on".
Chain-of-Thought (CoT)
A prompting technique where you ask the model to reason step by step before answering. Dramatically improves accuracy on math, logic, and multi-step problems. "Let's think step by step..."
07
Chapter Seven Β· Your Path Forward

How to Learn AI β€” A Practical Roadmap

AI is learnable by anyone willing to invest the time. Here is a realistic, structured path from curious beginner β†’ capable practitioner β€” regardless of your current background.

01
Build Intuition (No-Code)
  • Use ChatGPT, Claude, Gemini daily β€” experiment
  • Watch 3Blue1Brown's "Neural Networks" series
  • Read: "AI Superpowers" (Kai-Fu Lee)
  • Play with Teachable Machine by Google
⏱ 2–4 weeks Β· Zero prerequisites
02
Python & Data Foundations
  • Python: variables, loops, functions, classes
  • NumPy, Pandas β€” data manipulation
  • Matplotlib / Seaborn β€” visualization
  • Fast.ai "Practical Deep Learning" Part 1
⏱ 4–8 weeks Β· Basic logic helpful
03
Machine Learning Core
  • Andrew Ng's ML Specialization (Coursera)
  • Scikit-learn: regression, classification, clustering
  • Train your first model on a Kaggle dataset
  • Understand: train/val/test splits, overfitting, metrics
⏱ 6–10 weeks Β· Python required
04
Deep Learning & LLMs
  • Deep Learning Specialization β€” Andrew Ng
  • PyTorch or TensorFlow β€” build neural nets
  • HuggingFace: load, fine-tune, deploy models
  • Read "Attention Is All You Need" paper
⏱ 8–12 weeks Β· ML core required
05
Build Real Projects
  • Build a RAG chatbot over your own documents
  • Create an AI agent with LangChain / LangGraph
  • Fine-tune a Llama or Mistral model
  • Participate in Kaggle competitions
⏱ Ongoing · Portfolio building
06
Specialize & Stay Current
  • Pick a domain: CV, NLP, RL, MLOps, AI Safety
  • Follow: Andrej Karpathy, Yann LeCun, papers.cool
  • Read Anthropic, OpenAI, DeepMind research blogs
  • Contribute to open-source AI projects
⏱ Continuous · Community is key
The Best Free Learning Resources
ResourceTypeBest ForLink / Platform
fast.ai β€” Practical Deep LearningCourse (free)Hands-on DL, top-down approachfast.ai
Andrew Ng's ML SpecializationCourse (audit free)ML fundamentals, theory + practiceCoursera
Andrej Karpathy β€” Neural Networks: Zero to HeroYouTube (free)Build transformers from scratchYouTube
HuggingFace CoursesCourse (free)NLP, diffusion, RL with Transformershuggingface.co/learn
3Blue1Brown β€” Neural NetworksYouTube (free)Visual intuition for how NNs workYouTube
Kaggle Learn + CompetitionsPractice (free)Hands-on ML with real datakaggle.com/learn
Arxiv Sanity / papers.coolResearch papersStaying up-to-date with AI researcharxiv.org
Go deeper β†’ AI Foundation β€” 5 in-depth sections covering fundamentals, mathematics, ML essentials, deep learning, and agentic AI.