LearningTree

Artificial Intelligence Overview

From the origins of AI to cutting-edge agentic systems — a structured guide through the ideas, mathematics, algorithms, and engineering that define modern AI.

Chapter One · Introduction

What Is Artificial Intelligence?

Artificial Intelligence is the science of building systems that can perceive, reason, learn, and act — performing tasks that traditionally required human intelligence.

The AI Spectrum — From Narrow to General

A Brief History of AI

1950

Turing Test & Birth of AI

Alan Turing asks "Can machines think?" — proposes the Imitation Game. McCarthy coins "Artificial Intelligence" at the 1956 Dartmouth Conference.

1956–1974

The Golden Age

Early optimism — checkers-playing programs, ELIZA chatbot, symbolic reasoning systems. Perceptron invented by Rosenblatt (1957).

1974–1980

First AI Winter

Funding cuts as early AI hit scalability limits. Minsky & Papert's critique of perceptrons stalls neural network research.

1980–1987

Expert Systems Boom

Rule-based systems (MYCIN, XCON) deployed in industry. Backpropagation re-discovered (Rumelhart & Hinton, 1986).

1987–1993

Second AI Winter

Expert systems brittle and expensive to maintain. AI hardware market collapses. Neural network research stalls again.

1997–2011

ML Renaissance

Deep Blue beats Kasparov (1997). SVMs, kernel methods, probabilistic graphical models mature. Web-scale data emerges.

2012

Deep Learning Revolution

AlexNet wins ImageNet by a huge margin using GPU-trained CNNs. Deep learning era begins — LeCun, Bengio, Hinton (Turing Award 2018).

2017

"Attention Is All You Need"

Google Brain introduces the Transformer architecture — replaces RNNs, becomes the foundation of all modern LLMs.

2020–2023

LLM Era — GPT-3 to ChatGPT

OpenAI GPT-3 (175B params). GitHub Copilot. ChatGPT (100M users in 60 days). GPT-4, Claude, Gemini. DALL-E, Stable Diffusion, Midjourney.

2024–2026

Agentic AI Frontier

LLM agents with tools, memory, and planning. Multi-agent frameworks (LangChain, CrewAI, AutoGen). Reasoning models (o1, o3, DeepSeek-R1). Multimodal foundation models.

The Three Pillars of AI Progress

Why AI accelerated — Data × Compute × Algorithms

Deep dive → AI Fundamentals: History, Types & Core Concepts

Chapter Two · Core Pillars

Foundations of AI Systems

Mathematical Foundations

All of AI rests on a mathematical foundation — linear algebra (vectors, matrices), calculus (optimization), and probability & statistics (reasoning under uncertainty).

📐

Linear Algebra

Vectors & matrices — the language of data
Matrix multiplication — core of neural nets
Eigenvalues, SVD — dimensionality reduction
Dot products — similarity & attention scores

∂

Calculus & Optimization

Partial derivatives — sensitivity of loss
Gradient descent — the engine of learning
Chain rule — the heart of backprop
Adam, RMSProp — adaptive optimizers

🎲

Probability & Stats

Probability distributions — model outputs
Bayes' theorem — belief updating
MLE & MAP — parameter estimation
KL divergence — comparing distributions

Deep dive → Mathematical Foundations of AI

Machine Learning Essentials

Machine learning is the practice of building systems that learn patterns from data — rather than being explicitly programmed with rules. The model improves as it sees more examples.

The Three ML Paradigms

Key ML Algorithms at a Glance

Algorithm	Type	When to Use	Limitation
Linear / Logistic Regression	Supervised	Baseline, interpretable, fast	Can't capture non-linear patterns
Decision Tree / Random Forest	Supervised	Tabular data, feature importance	Can overfit; forests = less interpretable
Gradient Boosting (XGBoost)	Supervised	Best accuracy on tabular data	Slow to train; many hyperparameters
k-Means Clustering	Unsupervised	Grouping unlabeled data	Requires k upfront; sensitive to init
PCA	Unsupervised	Dimensionality reduction, visualization	Linear only; components not interpretable
Q-Learning / DQN	Reinforcement	Discrete action spaces, games	Doesn't scale to continuous actions well

Deep dive → ML Essentials: Algorithms, Evaluation & Feature Engineering

Deep Learning & Neural Networks

Deep learning uses multi-layer neural networks to automatically learn hierarchical representations from raw data — powering vision, language, speech, and generative AI.

Neural Network — Layers & Forward Pass

The Transformer — Architecture Behind LLMs

Transformer Encoder Block — Self-Attention + FFN

LLM Scale — From GPT-1 to GPT-4

Model	Year	Parameters	Training Data	Key Milestone
GPT-1	2018	117M	BooksCorpus	First large-scale pre-trained LM
BERT	2018	340M	Wikipedia + Books	Bidirectional encoding; NLP SOTA
GPT-2	2019	1.5B	WebText (40GB)	"Too dangerous to release" — coherent long text
GPT-3	2020	175B	Common Crawl (570GB)	Few-shot learning emerges at scale
ChatGPT (GPT-3.5)	2022	~175B	+ RLHF fine-tuning	100M users in 60 days
GPT-4	2023	~1T (est.)	Multimodal + RLHF	Passes bar exam, multimodal reasoning
Claude 3.5 / Gemini 1.5	2024	Undisclosed	Long context (1M+)	Million-token context windows

Deep dive → Deep Learning: Neural Networks, CNNs, Transformers & LLMs

Chapter Three · Agentic AI

Agentic AI & Autonomous Systems

Agentic AI moves beyond question-answering — LLM-powered agents that plan, use tools, maintain memory, and act autonomously to complete multi-step goals in the real world.

LLM Agent Architecture — Core Components

The ReAct Pattern — Reasoning + Acting

ReAct Loop — How Agents Solve Multi-Step Tasks

Major Agentic AI Frameworks

Framework	Focus	Best For	Key Feature
LangChain	Chains & agents	RAG, tool-use, pipelines	Huge ecosystem; LCEL chains
LangGraph	Stateful agent graphs	Complex multi-step agents	Cyclic graphs, controllable state
AutoGen (Microsoft)	Multi-agent conversations	Code generation & review	Agents debate/collaborate via messages
CrewAI	Role-based agents	Task delegation to specialists	Crew + Roles + Tasks abstraction
LlamaIndex	RAG & data indexing	Knowledge-grounded agents	Advanced retrieval pipelines
OpenAI Assistants API	Managed agents	Production agents with tools	Built-in memory, code interpreter

🧩

Agent Architecture Patterns

ReAct — Reason + Act + Observe loop
Plan-and-Execute — plan full task first, then execute steps
Reflexion — agent critiques its own outputs
Tree of Thoughts — explore multiple reasoning branches

🤝

Multi-Agent Patterns

Orchestrator + Workers — manager delegates to specialists
Peer Review — agents critique each other's outputs
Swarm — many simple agents, emergent behaviour
Human-in-the-Loop — human approves critical decisions

Deep dive → Agentic AI: Frameworks, Reasoning Patterns & Multi-Agent Systems

Chapter Four · The LLM Landscape

Who Makes the Major LLMs?

A handful of labs and companies now produce the world's most capable language models — each with a different philosophy, access model, and primary strength. Here is the current landscape as of 2025–2026.

The Major AI Labs at a Glance

Current Major LLMs — Detailed Comparison (2025–2026)

Model	Company	Context	Access	Primary Strengths
GPT-4o	OpenAI	128K	API / ChatGPT	Multimodal (vision+audio+text), strong reasoning, huge ecosystem
o3 / o4-mini	OpenAI	200K	API / ChatGPT	Extended "thinking" reasoning; excels at math, science, coding benchmarks
Claude 3.5 Sonnet	Anthropic	200K	API / Claude.ai	Best-in-class coding & instruction following, long-context analysis, safety
Claude 3 Opus	Anthropic	200K	API / Claude.ai	Highest intelligence in Claude 3 family; complex analysis & nuanced writing
Gemini 2.0 Flash	Google DeepMind	1M	API / AI Studio	Fastest Gemini; native multimodal; 1M-token context; Google Search integration
Gemini 2.0 Pro	Google DeepMind	2M	API / Gemini app	Largest context window available; deep Google Workspace & Search integration
Llama 3.3 70B	Meta AI	128K	Open weights	Best open-weight model at 70B scale; self-hostable; commercial use allowed
Llama 4 Scout / Maverick	Meta AI	10M	Open weights	MoE architecture; Scout: 10M ctx; Maverick: balanced performance/cost
Grok-3	xAI	131K	X Premium / API	Real-time X/Twitter data access; strong math & reasoning; "Think" mode
DeepSeek-V3	DeepSeek	128K	Open weights	MoE, 671B params (37B active); top coding & math; trained at fraction of cost
DeepSeek-R1	DeepSeek	128K	Open weights	Chain-of-thought reasoning model; matches o1 on many benchmarks at much lower cost
Mistral Large 2	Mistral AI	128K	Open weights	European flagship; strong multilingual, coding; deployable on-premises
Codestral	Mistral AI	32K	API	Specialized code model; fill-in-the-middle; 80+ programming languages
Phi-4	Microsoft	16K	Open weights	Small (14B) but punches above weight; strong STEM reasoning; edge-deployable
Gemma 3	Google DeepMind	128K	Open weights	Lightweight open model family (1B–27B); runs on consumer hardware; multimodal

Open-Weight vs Closed Models — What's the Difference?

🔓

Open-Weight Models

Weights are publicly downloadable (Llama, DeepSeek, Mistral, Gemma, Phi)
Self-host on your own hardware — full data privacy
Fine-tune for specific domains at low cost
No per-token API costs; ideal for high-volume or offline use
Tradeoff: often smaller or less capable than frontier closed models

🔒

Closed / Proprietary Models

Weights not public — access via API only (GPT-4o, Claude, Gemini)
Frontier performance: best-in-class on most benchmarks
Managed infrastructure — no hardware required
Pay per token; rate-limited; dependent on provider uptime
Tradeoff: data leaves your premises; less control

Chapter Five · Real-World Impact

AI in the World — Industry Applications

AI is no longer a research curiosity — it is being deployed across every major industry. Here is how AI is transforming the world you live and work in.

Healthcare & Medicine

🏥 Diagnostics & Drug Discovery

AlphaFold 2 solved protein folding — 200M+ structures predicted
AI detects cancer in radiology scans (sometimes better than doctors)
LLMs draft clinical notes and summarize patient histories
Drug discovery pipelines accelerated from years to months
Robotic surgery assistance (da Vinci + AI guidance)

Software Engineering

💻 AI-Assisted Development

GitHub Copilot writes ~40% of code in projects that use it
Claude / GPT-4 explain, debug, and refactor entire codebases
Automated test generation and code review
Natural-language-to-SQL and natural-language-to-API
AI agents autonomously resolve GitHub issues end-to-end

Finance & Banking

💳 Risk, Trading & Fraud

Real-time fraud detection across billions of transactions daily
Algorithmic trading and sentiment analysis of news/social media
AI underwriting — instant loan risk assessment
LLMs for regulatory document analysis and compliance checking
Personalized financial advice (robo-advisors)

Education

📚 Personalized Learning

AI tutors that adapt to each student's pace and learning style
Automated grading and detailed feedback on essays
Language learning apps (Duolingo's AI conversation practice)
Content generation — quizzes, explanations, worked examples
Accessibility tools: real-time captioning, text simplification

Creative Industries

🎨 Art, Music & Writing

Midjourney, DALL-E 3, Stable Diffusion — text-to-image generation
Suno / Udio — generate full songs from a text prompt
LLMs as writing assistants, copywriting, and content at scale
Video generation: Sora, Runway, Kling — text-to-video
Game asset generation and NPC dialogue trees

Science & Research

🔬 Accelerating Discovery

Literature review and hypothesis generation from 100K+ papers
Climate modeling and materials science simulation
AI co-pilots in genomics and bioinformatics pipelines
Mathematics: AI assists in finding novel proofs (FunSearch)
Particle physics: AI pattern recognition in collider data

Chapter Six · Reference

Essential AI Glossary

AI has a dense vocabulary. Here are the terms you will encounter most often — explained plainly, without jargon.

Token

The basic unit LLMs process — roughly a word or word-piece. "ChatGPT" = 1 token. "unbelievable" ≈ 3 tokens. Models have a maximum token limit (context window).

Context Window

The total amount of text an LLM can "see" at once — its working memory. GPT-3 had 4K tokens; Gemini 2.0 Pro has 2 million. Larger = can process entire books in one go.

Hallucination

When an AI confidently states something false. LLMs predict the most plausible next token — they don't verify facts. Always cross-check factual claims from AI.

Embedding

A list of numbers (vector) that captures the meaning of a word, sentence, or document. Similar concepts have similar vectors. Powers semantic search and RAG systems.

RAG (Retrieval-Augmented Generation)

A technique that gives an LLM access to external documents at query time. Instead of relying on training memory, the model retrieves relevant facts first, then generates. Reduces hallucinations.

Fine-Tuning

Further training a pre-trained model on a smaller domain-specific dataset to specialize it. Think of it as taking a general-purpose model and teaching it your company's jargon and style.

RLHF

Reinforcement Learning from Human Feedback — the training technique behind ChatGPT, Claude, and Gemini. Human raters score outputs; the model learns to produce higher-rated responses.

Prompt Engineering

The art of crafting inputs to get better outputs from LLMs. Techniques: few-shot examples, chain-of-thought ("think step by step"), system prompts, role assignments, and output format specification.

Temperature

A parameter controlling randomness in LLM outputs. Temperature 0 = deterministic, always picks the most likely token. Temperature 1 = creative and varied. High temp → more surprising, less reliable.

Parameters / Weights

The numbers inside a neural network that encode all learned knowledge. GPT-4 has ~1 trillion parameters. More parameters generally means more capacity — but also more compute cost.

Mixture of Experts (MoE)

An architecture where a large model is divided into "expert" sub-networks, and only a subset activates per token. DeepSeek-V3 has 671B total params but only uses 37B per forward pass — efficient at scale.

Inference

Running a trained model to generate an output — as opposed to training. When you send a message to ChatGPT, the server is doing inference. Inference cost is what you pay for in API pricing.

Multimodal

A model that can process multiple types of data — text, images, audio, video. GPT-4o, Gemini, and Claude 3 are multimodal: they can read images, describe charts, and analyze PDFs.

Overfitting

When a model memorizes training data instead of generalizing. It performs great on training examples but fails on new data. Prevented by regularization, dropout, more diverse data.

Attention Mechanism

The core innovation in Transformers — lets the model weigh which parts of the input are most relevant to each output token. "The cat sat on the mat" — predicting "mat" attends strongly to "sat" and "on".

Chain-of-Thought (CoT)

A prompting technique where you ask the model to reason step by step before answering. Dramatically improves accuracy on math, logic, and multi-step problems. "Let's think step by step..."

Chapter Seven · Your Path Forward

How to Learn AI — A Practical Roadmap

AI is learnable by anyone willing to invest the time. Here is a realistic, structured path from curious beginner → capable practitioner — regardless of your current background.

Build Intuition (No-Code)

Use ChatGPT, Claude, Gemini daily — experiment
Watch 3Blue1Brown's "Neural Networks" series
Read: "AI Superpowers" (Kai-Fu Lee)
Play with Teachable Machine by Google

⏱ 2–4 weeks · Zero prerequisites

Python & Data Foundations

Python: variables, loops, functions, classes
NumPy, Pandas — data manipulation
Matplotlib / Seaborn — visualization
Fast.ai "Practical Deep Learning" Part 1

⏱ 4–8 weeks · Basic logic helpful

Machine Learning Core

Andrew Ng's ML Specialization (Coursera)
Scikit-learn: regression, classification, clustering
Train your first model on a Kaggle dataset
Understand: train/val/test splits, overfitting, metrics

⏱ 6–10 weeks · Python required

Deep Learning & LLMs

Deep Learning Specialization — Andrew Ng
PyTorch or TensorFlow — build neural nets
HuggingFace: load, fine-tune, deploy models
Read "Attention Is All You Need" paper

⏱ 8–12 weeks · ML core required

Build Real Projects

Build a RAG chatbot over your own documents
Create an AI agent with LangChain / LangGraph
Fine-tune a Llama or Mistral model
Participate in Kaggle competitions

⏱ Ongoing · Portfolio building

Specialize & Stay Current

Pick a domain: CV, NLP, RL, MLOps, AI Safety
Follow: Andrej Karpathy, Yann LeCun, papers.cool
Read Anthropic, OpenAI, DeepMind research blogs
Contribute to open-source AI projects

⏱ Continuous · Community is key

The Best Free Learning Resources

Resource	Type	Best For	Link / Platform
fast.ai — Practical Deep Learning	Course (free)	Hands-on DL, top-down approach	fast.ai
Andrew Ng's ML Specialization	Course (audit free)	ML fundamentals, theory + practice	Coursera
Andrej Karpathy — Neural Networks: Zero to Hero	YouTube (free)	Build transformers from scratch	YouTube
HuggingFace Courses	Course (free)	NLP, diffusion, RL with Transformers	huggingface.co/learn
3Blue1Brown — Neural Networks	YouTube (free)	Visual intuition for how NNs work	YouTube
Kaggle Learn + Competitions	Practice (free)	Hands-on ML with real data	kaggle.com/learn
Arxiv Sanity / papers.cool	Research papers	Staying up-to-date with AI research	arxiv.org

Go deeper → AI Foundation — 5 in-depth sections covering fundamentals, mathematics, ML essentials, deep learning, and agentic AI.