AI Foundation ยท Domain 01

Foundations of Artificial Intelligence

What AI actually is, where it came from, why symbolic AI failed, the philosophical debates that matter today, the rational agent framework, competing paradigms, and the state of the field.

1.1
Chapter 1.1 ยท Definitions & Scope
What Is Artificial Intelligence?

AI is not a single thing. It is a collection of techniques for building systems that optimise toward a goal using data. The magic disappears when you see the math.

๐Ÿงฌ

Biological Intelligence

  • Embodied โ€” tied to a body with survival pressures
  • Developed through evolution over millions of years
  • Energy-efficient: 20 watts powers the human brain
  • Generalises from very few examples (few-shot by default)
  • Handles open-ended, ambiguous, novel situations naturally
๐Ÿค–

Machine Intelligence

  • Disembodied โ€” no physical survival pressure
  • Built through training on human-generated data
  • Energy-hungry: GPT-4 training cost ~$100M in compute
  • Requires millionsโ€“billions of examples to learn patterns
  • Brittle outside training distribution; exceptional at defined tasks
What makes a system "intelligent"? There is no consensus. The four most-cited criteria:
  • Ability to learn from experience
  • Ability to solve novel problems
  • Ability to understand and generate language
  • Ability to reason under uncertainty
No current AI system fully satisfies all four in the general sense humans do.

There is no single universally agreed definition of AI. Here are the four most cited, each emphasising a different aspect:

John McCarthy, 1956

"The science and engineering of making intelligent machines, especially intelligent computer programs."

Russell & Norvig โ€” AIMA 4th Ed.

"The study of agents that receive percepts from the environment and perform actions." The textbook definition.

Practical Engineering (2024)

"Building systems that can perceive, reason, learn, and act โ€” performing tasks that traditionally required human intelligence."

ISO/IEC 22989:2022

"An engineered system that generates outputs such as content, forecasts, recommendations, or decisions for a given set of human-defined objectives."

The common thread across all four: AI systems are designed to optimise for a goal. Not magic โ€” sophisticated optimisation machines trained on data. Whether the goal is to win a chess game, classify an image, or predict the next word in a sentence, the mechanism is the same: minimise a loss, maximise a reward.

"AI is whatever humans haven't figured out how to do yet. Once we have an algorithm for it, we stop calling it AI."

Tesler, L. โ€” coined at Xerox PARC, ~1970s  |  Larry Tesler's Theorem (the AI Effect)
AI โŠƒ Machine Learning โŠƒ Deep Learning โ€” nested subfields
Artificial Intelligence Expert systems, search, planning, symbolic AI Machine Learning Decision trees, SVMs, random forests, boosting Deep Learning Transformers, CNNs, LLMs, diffusion
TermFormal DefinitionEmergedCanonical Example
Artificial IntelligenceAny technique enabling machines to mimic intelligent behaviour1956 โ€” Dartmouth ConferenceChess programs, expert systems
Machine LearningAlgorithms that learn from data without explicit programming1959 โ€” Arthur SamuelDecision trees, XGBoost, SVMs
Deep LearningML using multi-layer neural networks learning hierarchical representations~2006 / mainstream 2012GPT-4, ResNet, AlphaFold, DALL-E
Data ScienceExtracting insight from data using statistics, ML, and visualisation2000sBusiness analytics, dashboards

All AI systems โ€” from the simplest spam filter to GPT-4 โ€” can be understood through four fundamental capabilities. These aren't arbitrary categories: they map directly to the architectural components you'll encounter in every system throughout this curriculum.

๐Ÿ‘๏ธ

Perceive

Receive input from the environment. Sensors, cameras, microphones, text APIs, database queries. The AI's window onto the world begins here.

Example: GPT-4 Vision reading an image; a self-driving car's LiDAR scanning the road ahead.

๐Ÿง 

Reason

Analyse, compare, infer, and plan. Apply logic, pattern matching, or learned heuristics to understand input and determine what to do next.

Example: A classifier deciding "this email is spam"; an LLM using chain-of-thought to plan multi-step solutions.

๐Ÿ“š

Learn

Update the internal model based on feedback, new data, or experience. This is what separates machine learning from traditional rule-based software.

Example: A model's weights adjusting during gradient descent; a recommender system adapting to user feedback.

โšก

Act

Generate outputs: text, decisions, control signals, code, images. Take actions in the world โ€” and potentially change the environment for the next perception cycle.

Example: GPT generating a response; a robot arm moving to pick up an object.

The Four AI Capabilities โ€” Perceive โ†’ Reason โ†’ Learn โ†’ Act
Output Action Input Data โ‘ฃ ACT โ€” Generate outputs: text, decisions, code, control signals โ‘ข LEARN โ€” Update internal model from feedback and new data โ‘ก REASON โ€” Analyse, compare, infer, plan, make decisions โ‘  PERCEIVE โ€” Sensors, cameras, microphones, text input, APIs

Perceive is the entry point. In large language models, the perception layer is the tokeniser and embedding lookup โ€” raw text transformed into high-dimensional vectors the network can process. In computer vision systems, it's the image pipeline: resize, normalise, encode. In robotics, it's sensor fusion from cameras, accelerometers, and LiDAR.

Reason sits at the heart of what makes AI feel intelligent. In a transformer model, attention mechanisms allow every token to reason about every other token in the context window. In a decision tree, reasoning is the sequence of feature comparisons that route an input to a leaf node prediction.

Learn is the mechanism that makes modern AI powerful. Traditional software is programmed with explicit rules. Learning systems instead infer rules from examples. A loss function quantifies error, and an optimiser updates parameters to reduce it โ€” repeated millions of times across millions of examples.

Act closes the loop. For a language model, acting means sampling the next token from a probability distribution. For a recommendation engine, surfacing the top-k items. For a robotic system, issuing motor commands. The action taken modifies the environment, producing new perception data.

A self-driving car PERCEIVES the road, REASONS about obstacles, has LEARNED from millions of miles of training data, and ACTS by steering. ChatGPT PERCEIVES your text, REASONS using learned patterns, and ACTS by generating tokens one at a time.

These terms are often used interchangeably in the press, but they have precise relationships โ€” each is a subset of the one above. Consider ChatGPT: it is simultaneously correct to call it AI, ML, and Deep Learning โ€” each statement is true and increasingly specific.

The AI Field โ€” Nested Subfields & Overlaps
AI ML DL Gen AI DS overlap AI Chess ยท Expert systems ยท Robotics ยท Planning ยท LLMs Machine Learning SVMs ยท Random Forests ยท XGBoost ยท Clustering Learns from data โ€” no explicit rules Deep Learning CNNs ยท RNNs ยท Transformers ยท LLMs Multi-layer neural networks Generative AI GPT ยท DALL-E ยท Stable Diffusion ยท Sora Generates novel content Data Science Statistics ยท SQL ยท Visualisation ยท BI Overlaps AI/ML โ€” not a true subset

However, the reverse implication does not hold. Not all AI is ML โ€” expert systems encode human knowledge directly as rules, without learning from data. Not all AI is deep learning โ€” a random forest or naive Bayes classifier are ML but not deep learning. Data Science is a separate discipline focused on extracting business insight via statistics, visualisation, and ML โ€” it overlaps AI but is not a true subset.

Machine Learning IS:
Machine Learning IS NOT:
  • Systems that learn rules from data without explicit programming
  • Improving in performance with more and better data
  • Generalising learned patterns to new, unseen examples
  • The dominant paradigm powering modern AI products
  • Just statistics (it adds architectural inductive biases statistics lacks)
  • Always neural networks (trees, SVMs, and ensembles are ML too)
  • Magic or sentient โ€” it is function approximation
  • Guaranteed to work on any problem without careful engineering

Common misuse: "We use AI" usually means "we use ML" and often specifically means "we use a trained model." Precision matters โ€” especially when evaluating vendor claims. Ask: Is it rule-based? Trained? On what data? Measured by what metric?

AI systems can be classified along two axes: scope (what range of tasks?) and capability level (how does it compare to human intelligence?).

๐ŸŽฏ

ANI โ€” Narrow AI

Excels at one specific task. Cannot generalise outside its training domain. All AI today is ANI โ€” GPT-4 is brilliant at text but cannot physically navigate a room. AlphaGo beat world champions at Go but cannot play chess.

All AI Today
๐Ÿง 

AGI โ€” General AI

Can perform any intellectual task a human can. Transfers knowledge across completely different domains without retraining. Not yet achieved. Timeline: 5โ€“20+ years (highly debated). OpenAI's stated mission.

Research Frontier
๐Ÿš€

ASI โ€” Superintelligence

Surpasses human intelligence in every domain. Could recursively self-improve. Purely theoretical. Motivates AI safety and alignment research today โ€” the "singularity" discussed by Bostrom, Yudkowsky, Tegmark.

Theoretical
The AI Spectrum โ€” ANI โ†’ AGI โ†’ ASI
ANI Narrow AI GPT-4, AlphaGo, FaceID WE ARE HERE AGI General AI Not yet achieved โ€” debated ASI Super AI Theoretical / long horizon โ† Task-specific capability ยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท General & autonomous โ†’
โœ… What ANI Can Do Well Today
โš ๏ธ What ANI Still Cannot Do Reliably
  • Fluent, coherent text generation at scale
  • Image classification (often superhuman accuracy)
  • Code writing, debugging, and refactoring
  • High-quality language translation
  • Strategic game playing (Go, Chess, StarCraft II)
  • Protein structure prediction (AlphaFold)
  • Reliable reasoning about novel physical situations
  • Maintaining consistent long-term plans over time
  • Transferring skills across domains without retraining
  • Acting reliably in open-ended, unpredictable environments
  • Consistently avoiding hallucination and factual error

Public discourse around AI is saturated with misconceptions โ€” drawn from science fiction, corporate marketing, and media sensationalism. Before going deeper into mechanisms, it is worth explicitly naming what AI is not.

๐ŸŽฌ

Not Like the Movies

HAL 9000, Skynet, and Samantha from Her are not realistic portrayals. Real AI systems are narrow, brittle outside their training domain, and have no goals of their own โ€” they optimise whatever objective they're given.

โœจ

Not Magic โ€” It's Statistics

A language model predicts the most likely next token given all previous tokens. An image classifier outputs a probability distribution over labels. Every "intelligent" output is the result of mathematical optimisation on data.

๐Ÿงฌ

Not Sentient or Conscious

Current AI systems have no subjective experience, emotions, or desires. They process inputs and produce outputs. Whether future AI could be conscious is a genuine philosophical question โ€” today's systems are not.

๐Ÿ”ฎ

Not Infallible

AI systems fail in unexpected ways. They hallucinate facts, exhibit bias inherited from training data, and can be fooled by adversarial examples. They are tools with specific, well-characterised failure modes โ€” not oracles.

The goal of this documentation is to replace awe with understanding. Every impressive AI output is traceable to training data, an objective function, and an optimisation algorithm. The mystery evaporates when you see the mechanism.

1.2
Chapter 1.2 ยท Narrative History
A Brief History of AI โ€” Origin to Present

AI has followed a pattern of hype โ†’ disillusionment โ†’ breakthrough โ€” twice. Understanding why it failed helps you understand why deep learning succeeded where everything before it failed.

AI history is a story of alternating euphoria and collapse. Understanding why each era gave way to the next โ€” not just when โ€” is the best inoculation against misreading hype today. The pattern: overpromise โ†’ underfund โ†’ winter โ†’ unexpected breakthrough โ†’ repeat.

AI Progress & Hype โ€” 1950 to 2026
Progress / Hype 1950 1970 1987 1997 2006 2012 2017 2026 WINTER 1 WINTER 2 Dartmouth Golden Age Deep Blue AlexNet Transformer Now

Long before the first computer was built, mathematicians and philosophers dreamed of mechanising thought. Gottfried Wilhelm Leibniz (1646โ€“1716) envisioned a Characteristica Universalis โ€” a universal symbolic language that could encode all human knowledge โ€” and a calculus ratiocinator that could reason over it mechanically. This dream of reducing reasoning to symbol manipulation would resurface, unchanged in spirit, at the Dartmouth Conference three centuries later.

George Boole (1854) formalised logic as algebra โ€” reducing true/false statements to 0s and 1s and defining the operations AND, OR, NOT. This was the mathematical bedrock everything else was built on. Ada Lovelace (1840s), writing annotations on Babbage's Analytical Engine, described the first published algorithm intended for a computing machine โ€” and crucially, articulated both its power and its limits: the machine can only do what it is told. Claude Shannon (1938) showed that Boolean algebra could be implemented in electronic circuits, connecting Boole's logic to physical hardware. And finally McCulloch and Pitts (1943) proposed the first mathematical model of a neuron โ€” a binary threshold unit that fires when its weighted inputs exceed a threshold. Every modern neural network traces its ancestry directly to that 1943 paper.

These five contributions โ€” symbolic reasoning, Boolean logic, algorithms, logic circuits, and artificial neurons โ€” were the intellectual raw materials that the Dartmouth group assembled into a new field in 1956.

Pre-History Milestones โ€” The Intellectual Foundations of AI
1640s Leibniz Reasoning Machine 1840s Lovelace First Algorithm 1854 Boole Boolean Logic 1938 Shannon Logic Circuits 1943 McCulloch-Pitts Artificial Neuron

In 1950 Alan Turing published "Computing Machinery and Intelligence" โ€” opening with the question "Can machines think?" He proposed the Imitation Game as a pragmatic operational test: if a machine can convince a human judge, communicating only via text, that it is human, we have sufficient practical grounds to attribute intelligence to it. Turing was careful not to define intelligence โ€” he proposed the test precisely to sidestep that philosophical quagmire. He also described the "child machine" concept: rather than programming adult intelligence directly, build a machine that learns. This was the first clear articulation of what we now call machine learning.

Six years later, in the summer of 1956, John McCarthy, Marvin Minsky, Claude Shannon, and Herbert Simon convened at Dartmouth College and officially founded the field of Artificial Intelligence. Their proposal was breathtakingly optimistic: "Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it." Minsky later claimed that "within a generation the problem of creating AI will substantially be solved." That generation passed without resolution. The Dartmouth group were brilliant researchers who massively underestimated three things: the importance of data, the role of perception and embodiment, and the sheer computational depth required.

The Turing Test (Imitation Game) โ€” Setup & Outcomes
๐Ÿง‘โ€โš–๏ธ HUMAN JUDGE Text only โ€” no voice, no appearance ๐Ÿ‘ค HUMAN (A) ๐Ÿค– MACHINE (B) โœ“ Machine passes โ†’ demonstrates intelligent behaviour โœ— Machine fails โ†’ insufficient for intelligence
What Dartmouth Got Right
What Dartmouth Got Wrong
  • AI as a formal, dedicated research field
  • Symbolic manipulation is a valid form of reasoning
  • Computers can exhibit intelligent behaviour
  • The field needs dedicated researchers and funding
  • Intelligence timeline vastly underestimated
  • Ignored the critical importance of data
  • Ignored embodiment, perception, and grounding
  • Assumed logic alone equals intelligence

The decade following Dartmouth was genuinely remarkable. Early AI programs demonstrated things no one had seen before: a machine proving mathematical theorems, a program that could hold a conversation, a system that learned to play checkers by playing against itself. In 1959 Arthur Samuel's checkers program โ€” which improved by self-play โ€” gave the field a new term: machine learning. The optimism was not irrational โ€” the demos were real. The problem was that toy domains don't scale.

1957
Perceptron โ€” Rosenblatt
First learning machine: a single-layer network that could learn to classify linearly separable inputs. Proved that machines could learn from examples, not just rules.
1959
"Machine Learning" coined โ€” Arthur Samuel
Samuel's checkers program improved by self-play. He defined ML as "giving computers the ability to learn without being explicitly programmed." Still the best definition.
1965
DENDRAL โ€” Stanford
First expert system with real scientific impact: identified organic molecules from mass spectrometry. Proved narrow domain AI could produce genuine scientific value.
1966
ELIZA โ€” Weizenbaum
First chatbot, simulating a Rogerian therapist using pattern matching. Users formed genuine emotional connections โ€” a warning about anthropomorphism that applies directly to LLMs today.
1969
Minsky & Papert โ€” "Perceptrons"
Proved mathematically that single-layer perceptrons cannot learn XOR. Effectively killed neural network research funding for a decade. The field pivoted entirely to symbolic AI.
1970
SHRDLU โ€” Winograd
NLP system that understood commands in a simulated blocks world. Breathtakingly impressive in its domain. Completely brittle outside it โ€” the "toy world" problem made concrete.

The AI winters are the most important episodes in AI history for developing intuition about the current moment. They were not caused by bad science. They were caused by a structural gap: researchers promised capabilities that required data and compute that didn't yet exist, and funders cut support when those promises weren't kept on schedule.

AI Hype Cycles โ€” Two winters shaped modern research culture
Hype / Progress โ„๏ธ WINTER 1 โ„๏ธ WINTER 2 Golden Age Expert Systems Deep Learning 1956 1974 1980 1993 2012 2017 Now Dartmouth Peak Optimism Expert Sys. Boom AlexNet Transformer
โ„๏ธ

Why AI Failed (Technical)

  • Combinatorial explosion: symbolic search cannot scale to real-world problem sizes
  • No data: rule-based systems require hand-crafted knowledge for every fact
  • No compute: networks were too slow to train at any useful scale
  • Brittle: systems collapse immediately outside their narrow training domain
โš ๏ธ

Why AI Failed (Institutional)

  • Overpromised: researchers claimed 5โ€“10 year timelines to funders, repeatedly
  • Underfunded follow-through: funding cut the moment hype peaked
  • Misaligned incentives: demos optimised for impressiveness, not robustness
  • The AI Effect: as each thing worked, it stopped being called AI
WinterTriggerWhat CollapsedRoot CauseWhat Survived
First (1974โ€“80)Lighthill Report (UK, 1973); DARPA cutsGeneral-purpose AI, machine translation, symbolic reasoningCombinatorial explosion; overpromised timelines to fundersSpecialised systems; early expert system research; Prolog
Second (1987โ€“93)Expert system maintenance failures; Lisp machine market collapseCommercial expert systems; DARPA Strategic Computing ProgramExpert systems too brittle and expensive to maintain at real-world scaleBackpropagation (1986 rediscovery); statistical ML; RL foundations

Every AI winter was caused by the same three gaps: not enough DATA, not enough COMPUTE, and not enough ALGORITHMIC insight. By 2012, all three gaps had been filled simultaneously for the first time in history.

While AI winters suppressed funding, a quieter revolution was accumulating. Statistical machine learning โ€” SVMs, random forests, gradient boosting โ€” replaced brittle symbolic systems with principled, mathematically-grounded methods. Meanwhile, the web was producing data at a scale no previous generation could have imagined, and GPU hardware was becoming cheap enough to train meaningfully large networks.

1997
Deep Blue beats Kasparov
AI defeats the world chess champion for the first time. Minimax search + hand-crafted evaluation function. Proved AI could match humans in a complex domain โ€” but through brute-force search, not generalisation.
1998
LeNet โ€” LeCun
Convolutional neural networks applied to handwritten digit recognition (MNIST). Showed CNNs work in practice โ€” but GPUs weren't available yet, limiting scale.
2001
Random Forests โ€” Breiman
Powerful ensemble method combining many decision trees. Became the dominant practical ML algorithm for structured data โ€” and remains so in many domains today.
2006
Deep Belief Networks โ€” Hinton
Hinton showed that deep networks could be trained using layer-wise pre-training. Revived interest in deep neural networks after 15 years of dormancy.
2009
ImageNet โ€” Fei-Fei Li
1.2 million labelled images across 1000 categories. The dataset that would change everything โ€” the fuel that AlexNet needed to ignite the deep learning revolution three years later.
2011
IBM Watson wins Jeopardy!
NLP + knowledge retrieval at scale. Beat human champions Ken Jennings and Brad Rutter. Demonstrated that AI could handle open-ended natural language questions โ€” a mainstream moment for AI.
2011
GPUs proven for deep learning โ€” Ciresan et al.
First paper showing GPU-trained deep networks dramatically outperform CPU-trained ones. The compute bottleneck was about to break.
2012
The Three Enablers Converge
ImageNet (data) + NVIDIA CUDA (compute) + ReLU/Dropout/BatchNorm (algorithms) converge for the first time. The conditions for a revolution are now met.
๐Ÿ—„๏ธ

Data

Web-scale datasets: ImageNet (1.2M images), Common Crawl (trillions of words), user-generated content from social platforms. Training data grew 1000ร— in a decade โ€” providing the signal deep networks needed to learn.

โšก

Compute

NVIDIA GPUs and CUDA (2007) gave 100โ€“1000ร— speedup for matrix operations via massive parallelism. What took weeks on CPUs took hours on GPUs. Training deep networks became economically viable for the first time.

๐Ÿงช

Algorithms

ReLU activation (solved vanishing gradients), Dropout regularisation (prevented overfitting), Batch Normalisation (stabilised training), residual connections (enabled very deep networks). Each independently helpful; together transformative.

AlexNet's ImageNet victory in 2012 was the moment the field changed permanently. Krizhevsky, Sutskever, and Hinton's network achieved 15.3% top-5 error vs. the runner-up's 26.2% โ€” an 11-point gap that shocked the computer vision community. The model was trained on two consumer GPUs in five days. Within 18 months, every major research lab had pivoted entirely to deep neural networks. The pattern: one shocking result, then near-universal adoption.

YearEventWhy It Matters
โ˜… 2012 AlexNet โ€” Hinton, Krizhevsky, Sutskever Wins ImageNet by 10.8-point margin. GPU-trained CNN. Every major lab pivots to deep learning within 18 months. The modern AI era begins here.
2013 Word2Vec โ€” Mikolov (Google) Word embeddings: words as dense vectors. "king โˆ’ man + woman โ‰ˆ queen." Language has geometric structure. Foundation of all modern NLP.
2014 GANs โ€” Goodfellow et al. Generative Adversarial Networks: generator vs discriminator. First system to produce photorealistic images. The generative AI era begins conceptually.
2015 ResNet โ€” He et al. (Microsoft) 152-layer network wins ImageNet. Residual skip connections solved vanishing gradients at depth. Made arbitrarily deep networks trainable.
โ˜… 2016 AlphaGo โ€” DeepMind Beats Lee Sedol 4-1 at Go โ€” more positions than atoms in the observable universe. MCTS + deep RL. Arrived 10โ€“20 years ahead of expert predictions.
โ˜… 2017 "Attention Is All You Need" โ€” Vaswani et al. The Transformer architecture. Self-attention replaces RNNs entirely. Processes all tokens in parallel. Ancestor of GPT, BERT, DALL-E, Whisper, AlphaFold โ€” every major AI system built since 2018.
2018 BERT & GPT-1 โ€” Google & OpenAI Pre-train on unlabelled text โ†’ fine-tune on tasks. NLP benchmark records shattered across the board. The pre-train โ†’ fine-tune paradigm established.
โ˜… 2020 GPT-3 โ€” 175B parameters Few-shot learning at scale: give three examples in the prompt, model solves the task without fine-tuning. Foundation model era begins. OpenAI initially called it "too dangerous to release."
2021 DALL-E, CLIP, AlphaFold 2 Multimodal AI: images and text in the same embedding space. AlphaFold 2 solves protein structure prediction โ€” a 50-year open problem in biology โ€” in a single paper.

The Transformer deserves a dedicated note. "Attention Is All You Need" (Vaswani et al., 2017) replaced recurrent networks with a single elegant mechanism: self-attention, which allows every token to attend to every other token in parallel. This unlocked two things simultaneously โ€” full context access (no sequential bottleneck) and massive parallelism (scale with compute). The result: BERT, GPT, T5, LLaMA, Whisper, DALL-E, AlphaFold, and virtually every transformative AI system since 2018 is built on Transformers. It is arguably the most consequential single paper in AI history.

ChatGPT launched on November 30, 2022. It reached 1 million users in 5 days and 100 million users in 60 days โ€” the fastest consumer technology adoption in recorded history. Instagram took 2.5 years to reach the same milestone. Twitter took 5 years. This wasn't a product launch โ€” it was a cultural event. For the first time, a general-purpose AI system was accessible to anyone with a browser, and it worked well enough to be genuinely useful for everyday tasks.

WhenEventWhy It Matters
โ˜… Nov 2022 ChatGPT โ€” OpenAI 100M users in 60 days โ€” fastest consumer adoption ever. RLHF alignment made LLMs genuinely helpful. AI moved permanently into mainstream public discourse.
โ˜… Mar 2023 GPT-4 โ€” OpenAI Multimodal (text + images). Near-human on bar exam, medical licensing, and professional benchmarks. Set the capability standard that triggered the multi-model race.
Mar 2023 Claude (Anthropic) & Gemini (Google) Multi-model competitive landscape emerges. Constitutional AI alignment (Anthropic) and multimodal training at Google scale. No single company dominates โ€” the race is on.
โ˜… Jul 2023 LLaMA 2 โ€” Meta (open weights) Open-source LLMs publicly released. Anyone with a GPU can run, fine-tune, or modify frontier-class models. Democratised AI development and sparked the open-source ecosystem.
โ˜… 2024โ€“2026 Reasoning Models & Agentic AI o1, o3, DeepSeek-R1: models that think before answering. Claude 3.7, Gemini 2: long-context, multimodal, agentic. Computer-use agents. Physical AI (humanoid robots + LLMs). Current frontier.

Unlike the boom-bust cycles of the past, the current AI wave is reinforced by a self-amplifying economic flywheel. Better models attract more users. More users generate more revenue and data. More revenue funds more compute. More compute trains better models. The cycle has no obvious brake โ€” which is precisely why the pace of improvement has been relentlessly accelerating since 2012, and why projection-based estimates consistently underestimate where we will be in three years.

The AI Progress Flywheel โ€” Why the pace keeps accelerating
Self-Reinforcing Flywheel Better Models Higher accuracy & capability More Compute GPU clusters, TPUs, chips More Investment $100B+ in 2024 alone More Data Web, sensors, synthetic

The flywheel has been spinning faster every year since 2012. Training compute for frontier models has grown roughly 4ร— per year. GPT-4's training run cost an estimated $100M. Next-generation frontier models are projected at $1B+. This isn't reckless spending โ€” the returns on better models are large enough that the economics reinforce continued investment. Unlike the AI winters, there is no plausible external shock that would stop this cycle โ€” only the discovery of fundamental capability limits, which has not yet materialised.

๐Ÿ“‹ Chapter 1.2 โ€” Key Takeaways
  • Two AI winters caused by the same three gaps: insufficient data, compute, and algorithms โ€” not bad science
  • AlexNet (2012) and Transformer (2017) are the two true inflection points of modern AI
  • ChatGPT (Nov 2022) โ€” 100M users in 60 days โ€” made AI mainstream overnight; Instagram took 2.5 years
  • Expert systems failed because knowledge cannot be fully encoded as rules โ€” ML learns it from data instead
  • The AI flywheel: better models โ†’ more investment โ†’ more compute โ†’ more data โ†’ better models
  • Understanding AI winters is essential for evaluating today's hype with appropriate skepticism and calibration
  • 2024โ€“2026: reasoning models + agentic AI = the current research and product frontier
1.3
Chapter 1.3 ยท Knowledge & Reasoning
Symbolic AI & Knowledge Representation

Symbolic AI is commonly omitted from modern documentation. This is a mistake. Understanding why symbolic AI dominated for 30 years โ€” and exactly why it failed to scale โ€” is what explains why neural networks won. More importantly, it reveals what neural networks still cannot do.

The philosopher John Haugeland coined the term GOFAI (Good Old-Fashioned AI) in 1985 to describe the dominant paradigm that had governed AI research since Dartmouth. Its core assumption: "Intelligence is symbol manipulation according to rules." Feed a machine symbols representing the world, give it logical rules for manipulating those symbols, and intelligence will emerge. The physical symbol system hypothesis (Newell & Simon, 1976) formalised this: "A physical symbol system has the necessary and sufficient means for general intelligent action." This claim was bold, wrong in the absolute sense, and enormously productive for 30 years.

Two schools developed within GOFAI. The logic-based school (Newell, Simon) grounded AI in formal logic and theorem proving โ€” intelligence as deduction. The knowledge-based school (Minsky, McCarthy) focused on representing domain knowledge richly enough that a system could reason over it โ€” intelligence as structured lookup and inference. Both shared the same fatal assumption: that the hard part of intelligence is the reasoning engine, not the knowledge itself.

Symbolic AI
Connectionist AI (Neural Networks)
  • Explicit rules written by human experts
  • Fully interpretable โ€” you can read the logic
  • Fails hard on edge cases rules don't cover
  • No data needed โ€” rules are manual
  • Dominated 1956โ€“1986
  • Patterns learned directly from data
  • Black box โ€” hard to interpret weights
  • Robust to variation and noise
  • Needs large labelled datasets
  • Dominated 1986โ€“present
Symbolic AI vs. Neural AI โ€” How Knowledge Is Represented
SYMBOLIC AI (GOFAI) Human-written rules & logic IF fever AND cough โ†’ flu [0.8] IF flu โ†’ recommend tamiflu โœ“ Interpretable ยท Auditable โœ— Brittle ยท Can't learn ยท Doesn't scale Dominant 1956โ€“1990s โŸถ NEURAL AI (CONNECTIONIST) Learned weights from data โ†’ output โœ“ Learns from data ยท Scales ยท Generalises โœ— Black box ยท Data-hungry ยท Fragile OOD Dominant 2012โ€“present

Before an AI system can reason, it must represent knowledge. GOFAI researchers developed four major representational frameworks โ€” each with a distinct philosophy about what knowledge is and how it should be stored.

๐Ÿ•ธ๏ธ

Semantic Networks

Nodes = concepts, edges = typed relationships. "Animal โ†’ IS-A โ†’ Mammal โ†’ IS-A โ†’ Dog โ†’ HAS โ†’ Fur". Inheritance flows through IS-A links โ€” Dog automatically inherits all Mammal properties. Precursor to modern knowledge graphs (Wikidata, Google Knowledge Graph, RAG retrieval indexes).

๐Ÿ“‹

Frames (Minsky, 1974)

Structured objects with named slots and default values โ€” essentially a class/struct for knowledge. A "Car" frame has slots: color (default: red), wheels (default: 4), engine: present. Frames support inheritance and overriding. Directly influenced object-oriented programming.

๐Ÿ—๏ธ

Ontologies

Formal specification of all concepts and relationships within a domain. Medical: SNOMED CT (350,000+ concepts). Web: OWL, RDF standards. The Cyc project (Lenat, 1984โ€“present): 25M+ rules encoding common-sense knowledge over 40 years. Modern AI: ontologies power knowledge graphs in RAG pipelines.

โš™๏ธ

Production Rules

IF [condition] THEN [action] โ€” the basis of all expert systems. Forward chaining: data-driven, fire rules that match current known facts, derive new facts until goal reached. Backward chaining: goal-driven, work backward from desired conclusion to find what must be true. Prolog uses backward chaining natively.

Semantic Network โ€” Concepts connected by typed relationships
Animal Mammal Bird Dog Cat Lassie legs: 4 sound: bark IS-A IS-A IS-A IS-A INSTANCE HAS-PROPERTY
// Frame representation โ€” structured knowledge with slots & defaults
FRAME: Car
  slots:
    color: default=red, type=string
    wheels: default=4, type=integer
    engine: default=present, type=boolean
    owner: default=nil, type=Person-Frame

FRAME: SportsCar (inherits: Car)
  overrides:
    turbo: default=true
    top_speed: default=250 // km/h, overrides Car default

Expert systems were the first commercially successful AI technology. The idea: interview a domain expert, encode their knowledge as production rules, build an inference engine to fire those rules. In the late 1970s and throughout the 1980s, this actually worked โ€” inside carefully bounded domains with well-defined rules.

// MYCIN-style production rule โ€” bacterial infection diagnosis
// Rule 52
IF organism.staining_reaction = "gram-negative"
AND organism.morphology = "rod"
AND patient.compromised_host = true
THEN organism.identity = "Pseudomonas" (confidence: 0.6)
SystemDomainYearRulesKey Achievement
DENDRALChemistry1965500+First automated scientific reasoning โ€” identified organic molecules from mass spectrometry
MYCINMedical diagnosis197260065% accuracy vs 42% for Stanford medical students. Never deployed โ€” liability concerns.
XCON / R1Computer configuration19802,500Saved DEC $40M/year configuring VAX systems. Proved AI had real commercial ROI.
PROSPECTORMineral exploration19781,000Discovered a $100M molybdenum deposit. Geological knowledge + probabilistic inference.
CycCommon-sense reasoning19841M+Still maintained. World's largest manually encoded knowledge base. Represents the limit of the approach.
โœ…

What Expert Systems Got Right

  • Formal representation of domain knowledge works within bounds
  • Explainable by design โ€” every decision traceable to a rule
  • Can capture rare cases that training data won't cover
  • Still used in medical, legal, and financial rule engines today
โŒ

Why Expert Systems Failed

  • Knowledge bottleneck: experts can't articulate their tacit knowledge as rules
  • Brittleness: catastrophic failure on any case the rules don't cover
  • Maintenance nightmare: every edge case needs a manual rule update
  • Can't learn: no mechanism to improve from new data or outcomes

Logic is the mathematics of valid inference. GOFAI researchers hoped to give AI the ability to reason with the same rigor as formal proof โ€” deriving new knowledge from existing knowledge with absolute certainty. Three levels of expressive power matter for AI.

๐Ÿ”ฃ

Propositional Logic

True/false propositions with connectives: AND (โˆง), OR (โˆจ), NOT (ยฌ), IMPLIES (โ†’). Fully decidable. Too limited: can't express properties of objects or quantify over them.

P = "It rains"   Q = "Ground is wet"
P โ†’ Q    // if it rains, ground is wet
P        // it rains (given)
โˆด Q     // modus ponens: ground is wet
โˆ€

First-Order Logic (FOL)

Adds variables, predicates, and quantifiers. Used in theorem provers, Prolog, OWL ontologies. Semi-decidable โ€” powerful but computationally expensive.

โˆ€x: Human(x) โ†’ Mortal(x)  // all humans are mortal
Human(Socrates)            // Socrates is human
โˆด Mortal(Socrates)         // universal instantiation
๐Ÿ”€

Non-Monotonic Reasoning

Classical logic: once proved, always true. Real world: "Birds fly... unless they're penguins." Non-monotonic reasoning handles exceptions and defaults โ€” conclusions can be retracted when new information arrives. Essential for commonsense AI.

/* First-order logic in Prolog โ€” forward/backward chaining */
human(socrates). human(plato).
mortal(X) :- human(X). /* โˆ€X: human(X) โ†’ mortal(X) */

/* ?- mortal(socrates). โ†’ true */
/* ?- mortal(zeus). โ†’ false (not in KB) */

bird(tweety). flies(X) :- bird(X). /* birds fly by default */
penguin(opus). bird(opus) :- penguin(opus).
flies(X) :- penguin(X), !, fail. /* penguins don't โ€” non-monotonic exception */
Inference Strategies โ€” Forward Chaining vs Backward Chaining
FORWARD CHAINING โ€” Data-driven Known Facts (database) Match IF conditions Fire matching rules โ†’ new facts Goal reached (or no new facts) BACKWARD CHAINING โ€” Goal-driven Known Facts โ€” sub-goals proven Recurse on sub-goals Set IF conditions as sub-goals Goal to prove (start here)

If knowledge representation is how GOFAI stores the world, search is how it reasons through it. AI as search through a state space was the dominant problem-solving paradigm from 1956โ€“1986. Define a state (current configuration), a set of actions (legal transitions), and a goal (desired configuration). The AI finds a sequence of actions from start to goal. Simple in principle โ€” catastrophically hard in practice as state spaces grow.

๐ŸŒŠ

Uninformed Search

  • BFS: explore layer by layer โ€” finds shortest path, memory expensive O(bd)
  • DFS: explore deep first โ€” memory efficient O(d), may not find shortest path
  • Iterative Deepening DFS: optimal like BFS, memory like DFS โ€” best of both worlds
๐ŸŽฏ

Informed (Heuristic) Search

  • A*: f(n) = g(n) + h(n) โ€” actual cost + heuristic estimate. Optimal if h is admissible (never overestimates actual cost)
  • Greedy best-first: h(n) only โ€” fast but not optimal
  • Used in: pathfinding, GPS navigation, game AI, robot motion planning
BFS vs A* โ€” Why heuristics matter
BFS โ€” Explores all nodes equally S G Explores many unnecessary nodes A* โ€” Guided by heuristic S G Far fewer nodes explored โ€” same optimal path
AlgorithmComplete?Optimal?TimeSpaceBest For
BFSโœ“ Yesโœ“ Yes (unweighted)O(bd)O(bd)Shortest path, small state spaces
DFSโœ“ Yesโœ— NoO(bm)O(bm)Memory-constrained, finding any path
A*โœ“ Yesโœ“ Yes (admissible h)O(bd)O(bd)Pathfinding with good heuristic
Minimaxโœ“ Yesโœ“ YesO(bm)O(bm)2-player zero-sum games
Alpha-Betaโœ“ Yesโœ“ YesO(bm/2)O(bm)Games โ€” 2ร— faster with pruning
A* in practice: g(n) = actual cost from start. h(n) = heuristic estimate to goal (e.g., straight-line distance for routing). f(n) = g(n) + h(n). Admissibility condition: h(n) must never overestimate actual cost. If admissible, A* is guaranteed optimal. Minimax & Alpha-Beta: Deep Blue evaluated 200M chess positions per second using Alpha-Beta pruning to defeat Kasparov in 1997 โ€” search without learning. Modern engines (Stockfish) combine Alpha-Beta with learned evaluation functions.

GOFAI wasn't bad science โ€” it produced real results in bounded domains. It failed because three fundamental problems proved impossible to solve within its paradigm. These aren't engineering problems. They are conceptual limits on what rule-based symbol manipulation can express.

๐Ÿ–ผ๏ธ

The Frame Problem

McCarthy & Hayes (1969): when an action occurs, what changes and what stays the same? "I move a cup โ€” does the coffee stay in it? Does the table change? Does gravity still apply?" Humans know intuitively what's relevant. Encoding this completeness is impossibly hard in a finite rule set.

๐Ÿ”’

Knowledge Bottleneck

Expert knowledge is largely tacit โ€” "know-how" not "know-that." Doctors, chess players, and engineers can't fully articulate the pattern recognition driving their decisions. This tacit knowledge is the data that neural networks learn from examples โ€” it can't be extracted through interviews.

๐Ÿ’ฅ

Brittleness & Explosion

Symbolic systems fail immediately outside their defined domain โ€” no graceful degradation. Real-world ambiguity can't be captured by finite rules. 1,000 rules produce millions of interaction cases. The combinatorial explosion of exceptions makes complete rule coverage computationally intractable.

The lesson of GOFAI: Intelligence is not stored in explicit rules. It emerges from exposure to experience. This insight is the entire foundation of machine learning โ€” neural networks don't fail on the frame problem because they simply learn what matters from data.

๐Ÿ“‹ Chapter 1.3 โ€” Key Takeaways
  • GOFAI assumption: intelligence = symbol manipulation by rules โ€” worked in narrow domains, failed at scale
  • Knowledge representation: semantic networks, frames, ontologies, production rules โ€” all have modern descendants
  • Expert systems worked commercially (XCON, MYCIN) but hit the knowledge acquisition bottleneck
  • FOL & Prolog: powerful for formal reasoning, but undecidable and computationally expensive at scale
  • A* search: optimal heuristic pathfinding โ€” still used in robotics, navigation, and game AI today
  • Symbolic AI's failure proved that intelligence must be learned from experience, not encoded as rules
1.4
Chapter 1.4 ยท Philosophy of Mind
The Turing Test & Philosophy of Mind

Turingโ€™s 1950 paper opens with โ€œI propose to consider the question, Can machines think?โ€ โ€” then immediately sidesteps it. The question matters not because we can answer it, but because our answer determines what we build and how we align it. These are not idle philosophical puzzles.

Alan Turingโ€™s 1950 paper โ€œComputing Machinery and Intelligenceโ€ is the founding document of AI philosophy. He proposed the Imitation Game as a pragmatic replacement for the unanswerable โ€œCan machines think?โ€ A human interrogator communicates via text with two participants โ€” one human, one machine. If the interrogator cannot reliably distinguish which is which, the machine has passed. Turing predicted that by 2000, a machine would fool 30% of judges after 5 minutes. He was roughly right about the timeline, but the test itself proved easier to pass than it was to mean anything.

The Turing Test โ€” Alan Turing, 1950 ยท โ€œComputing Machinery and Intelligenceโ€
๐Ÿง‘โ€โš–๏ธ INTERROGATOR Terminal โ€” text only ๐Ÿ‘ค HUMAN (A) ๐Ÿค– AI PROGRAM (B) Human or Machine? โœ“ Can't tell โ†’ Test passed โœ— Identified โ†’ Test failed
๐ŸŽญ

Critique 1: Behaviour โ‰  Intelligence

A system could pass using clever tricks without understanding anything. ELIZA (1966) fooled users into thinking they were talking to a therapist using simple pattern matching. GPT-4 arguably passes the test today โ€” does that mean it โ€œthinksโ€?

๐Ÿฆ‡

Critique 2: Wrong Benchmark

A bat navigates in darkness via echolocation โ€” we don't test AI on that. Why is human-style conversation the benchmark for ALL intelligence? AlphaGo is superhuman at Go but would fail the Turing Test. The test is anthropocentric by design.

๐Ÿ€„

Critique 3: The Chinese Room

John Searle argues a system could pass the test without understanding anything. The most famous critique of the Turing Test โ€” see Section 2 for the full argument.

๐Ÿ†

Has Anything Passed It?

Eugene Goostman (2014) claimed to pass โ€” controversy ensued (judges were lenient). Modern LLMs like GPT-4 routinely fool humans in short conversations. The test itself may now be obsolete as a meaningful benchmark.

Test / BenchmarkWhat It MeasuresYearCurrent Status
Turing TestLanguage imitation โ€” can machine fool a human?1950Easily gamed by modern LLMs โ€” not a meaningful bar
Winograd SchemaCommon-sense reasoning via pronoun resolution2011LLMs solved it by 2022 โ€” retired as a benchmark
ARC ChallengeNovel visual pattern reasoning2019GPT-4 scores ~85% โ€” approaching human level
MMLUKnowledge across 57 academic subjects2020GPT-4 ~87%, human expert ~89% โ€” essentially matched
BIG-Bench Hard23 hard multi-step reasoning tasks2022Frontier reasoning models approaching human performance

In 1980 philosopher John Searle published โ€œMinds, Brains, and Programsโ€ โ€” arguably the most influential and most debated paper in AI philosophy. His thought experiment: you are locked in a room. Through a slot, people pass slips of paper with Chinese characters. You have a rulebook: โ€œWhen you see symbol sequence X followed by Y, write Z and pass it back.โ€ You follow the rules perfectly. From outside, your responses are indistinguishable from those of a fluent Chinese speaker. But you understand nothing. You are just manipulating symbols according to formal rules.

Searleโ€™s conclusion: syntax is not sufficient for semantics. A program that processes symbols according to rules โ€” no matter how sophisticated โ€” does not thereby understand those symbols. It has the form of language processing without the content. By analogy: AI programs process language according to learned statistical rules. They produce outputs indistinguishable from understanding. But this doesnโ€™t mean they understand.

Searle's Chinese Room โ€” Syntax Without Semantics
THE ROOM ๐Ÿง‘ English speaker Understands nothing ๐Ÿ“– RULEBOOK If ไฝ  โ†’ reply ๅฅฝ If ๅƒ โ†’ reply ้ฅญ looks up ไฝ ๅฅฝ INPUT slot ไฝ ๅฅฝ OUTPUT ๐Ÿง Chinese speaker "The room knows Chinese!" Correct output โ‰  understanding ยท Syntax โ‰  semantics
๐Ÿ›๏ธ

Systems Reply

Itโ€™s not the person who understands โ€” itโ€™s the whole system (person + rules + room). Similarly, a single neuron doesnโ€™t understand, but the brain does. You must evaluate the system, not its components in isolation.

๐Ÿค–

Robot Reply

Put the room in a robot with sensors and actuators. Now symbols are connected to real-world referents โ€” โ€œfireโ€ is associated with heat sensors. Grounding in the physical world might be what produces genuine understanding.

๐Ÿง 

Brain Simulator Reply

What if the rulebook simulated every neuron in a Chinese speakerโ€™s brain exactly? Would that produce understanding? If not, then biological neurons have no special status either โ€” and consciousness is unexplained by either substrate.

๐Ÿ”„

Searleโ€™s Counter-Reply

All these replies move the lack of understanding around without eliminating it. The systems that result still only process symbols formally โ€” there is no genuine intentionality. Intentionality requires biology, Searle argues.

The Chinese Room is not a solved argument. GPT-4 is, in a very real sense, an enormously complex Chinese Room. Whether โ€œunderstandingโ€ requires something beyond symbol manipulation remains genuinely open โ€” and the answer determines how seriously we should take AI wellbeing and alignment.

Philosopher David Chalmers (1995) distinguished easy problems of consciousness โ€” explaining cognitive functions like attention, memory, and behaviour โ€” from the Hard Problem: why does all this processing feel like something from the inside? Why isnโ€™t cognition just computation happening โ€œin the darkโ€, without any inner experience? This question may be permanently beyond empirical investigation, because any physical description of a brain state leaves open why there is subjective experience associated with it.

๐ŸŒŒ

The Hard Problem

Why does it โ€œfeel like somethingโ€ to be conscious? Even a perfect physical description of brain activity doesnโ€™t explain qualia โ€” the redness of red, the painfulness of pain. Chalmers argues this may permanently resist scientific explanation.

โš™๏ธ

Functionalism

Mental states are defined by their functional role, not their physical substrate. If silicon performs the same functional operations as a brain, it thereby has mental states. This view is most sympathetic to the possibility of AI consciousness.

ฮฆ

Integrated Information Theory

Consciousness = integrated information ฮฆ (phi). High ฮฆ = rich inner experience. A simple logic gate: ฮฆ โ‰ˆ 0. A human brain: high ฮฆ. Simple feedforward neural networks have very low ฮฆ; richly recurrent systems could have more.

Whether current AI systems are conscious is not a scientific question we can currently answer. It is prudent to neither confidently assert nor confidently deny machine sentience. The honest answer is: we donโ€™t know โ€” and we donโ€™t yet have the tools to find out.

Stevan Harnad (1990) identified a fundamental problem with purely symbolic AI: symbols in a dictionary are defined in terms of other symbols. โ€œCat: a small domesticated carnivorous mammal with soft furโ€ฆโ€ โ€” circular definitions all the way down. For humans, symbols are grounded in sensorimotor experience. You know โ€œredโ€ because you have seen red. You know โ€œhotโ€ because you have felt heat. For large language models, symbols are grounded only in other symbols โ€” the statistical contexts in which words appear across trillions of tokens of text.

Grounded Symbols (Human)
Ungrounded Symbols (LLM)
  • โ€œhotโ€ = direct thermal sensation experienced via skin receptors
  • โ€œredโ€ = specific wavelength of light experienced visually
  • โ€œroughโ€ = tactile perception from physical touch
  • Meaning anchored in sensorimotor interaction with the world
  • โ€œhotโ€ = statistical co-occurrence with โ€œfireโ€, โ€œburnโ€, โ€œtemperatureโ€
  • โ€œredโ€ = co-occurrence with โ€œappleโ€, โ€œstop signโ€, โ€œbloodโ€
  • โ€œroughโ€ = co-occurrence with โ€œsandpaperโ€, โ€œtextureโ€, โ€œjaggedโ€
  • Meaning is pattern of word co-occurrence in training corpus only

This is why multimodal models (CLIP, GPT-4V, Gemini) and embodied robotics are active research frontiers โ€” they attempt to ground language in perceptual or physical experience. Whether statistical grounding in text is โ€œsufficientโ€ for understanding is precisely the empirical question the field is now testing at scale.

๐Ÿ”ง

Weak AI (Narrow AI)

Searleโ€™s term: AI that simulates intelligence for specific tasks without genuine understanding. The system behaves as if it understands, but has no intentionality. All current AI is Weak AI โ€” GPT-4, AlphaGo, image classifiers. The useful and productive engineering view: build systems that work, regardless of philosophical status.

๐ŸŒŸ

Strong AI

A system that genuinely understands โ€” not just simulates understanding. Would have beliefs, desires, and genuine intentionality in the philosophical sense. Searle argued this is impossible without biological substrate. Most AI researchers bypass this distinction entirely and focus on capabilities.

The practical view: for engineering purposes, the distinction doesnโ€™t matter. Build systems that work. The question of whether they โ€œtruly understandโ€ is a philosophical question that doesnโ€™t affect whether your spam filter catches spam โ€” but it does affect how we think about alignment, rights, and long-term AI governance.

๐Ÿงฉ

Multiple Intelligences (Gardner)

Eight distinct intelligences: linguistic, logical-mathematical, spatial, musical, bodily-kinaesthetic, interpersonal, intrapersonal, naturalistic. AI today dominates the first two; is largely absent from the last five.

๐Ÿคธ

Embodied Cognition

Intelligence is shaped by having a body that interacts with the world. Physical AI (robotics + LLMs) is the active frontier precisely because disembodied language models lack grounding in physical causality.

๐Ÿ›๏ธ

Cognitive Architectures

ACT-R (Anderson) and SOAR (Laird, Newell) are computational models of human cognition with procedural memory, declarative memory, and attention modules โ€” making testable predictions verified against human reaction time data.

๐Ÿ“‹ Chapter 1.4 โ€” Key Takeaways
  • The Turing Test measures behavioural imitation, not intelligence โ€” modern LLMs routinely pass it, rendering it obsolete as a benchmark
  • Chinese Room: syntax โ‰  semantics โ€” symbol manipulation without understanding may not be intelligence; the argument remains unresolved
  • The Hard Problem: why is there subjective experience at all? Science cannot currently answer this for biological or artificial systems
  • Symbol grounding: LLMs know words from statistical co-occurrence patterns, not sensorimotor experience โ€” multimodal AI attempts to address this
  • Strong AI (genuine understanding) vs Weak AI (behavioural simulation) โ€” all current AI is Weak AI by Searleโ€™s definition
  • These philosophical questions directly motivate alignment research: if AI can have goals and understanding, its objectives must be aligned with human values
1.5
Chapter 1.5 ยท AIMA Framework
Problem Solving & Rational Agents

PEAS, environment types, and the agent taxonomy โ€” the conceptual framework that connects classical AI to modern LLM agents. Russell & Norvig's Artificial Intelligence: A Modern Approach (AIMA) defines AI as the study of agents that perceive their environment and act to maximise their performance measure. This framework is the conceptual backbone of Domain 8 (Agentic AI) โ€” understanding it now is essential.

From Russell & Norvig's Artificial Intelligence: A Modern Approach โ€” the dominant textbook in AI education โ€” an agent is "anything that perceives its environment through sensors and acts upon that environment through actuators." The word "agent" is deliberately broad: it covers thermostats, chess programs, autonomous vehicles, and GPT-4 equally.

What makes an agent rational? For each possible percept sequence, a rational agent should select an action expected to maximise its performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the agent has. Rationality โ‰  omniscience. Rationality โ‰  perfection. Rationality = expected utility maximisation given available information.

Why does this abstraction matter? Because it unifies everything โ€” a thermostat perceives temperature and acts by switching heating, GPT-4 perceives your text and acts by generating tokens, a self-driving car perceives the road and acts by steering. Every AI system in this documentation can be analysed through this lens.

The Agent-Environment Loop โ€” the foundational AI abstraction
ENVIRONMENT SENSORS Percepts โ†’ data AGENT State ยท Model ยท Goals Utility ยท Learning ACTUATORS Decisions โ†’ actions percept action feedback loop โ€” learn from outcomes S = Sensors P = Performance ยท E = Environment A = Actuators
Definition: A rational agent selects actions expected to maximise its performance measure, given the evidence in its percept sequence and built-in knowledge.

Rationality โ‰  omniscience (knowing all outcomes). Rationality โ‰  perfection (always choosing optimally). Rationality = expected utility maximisation given available information.

PEAS stands for Performance measure, Environment, Actuators, Sensors. It is the standard tool for formally specifying any AI agent task. Before building any AI system, PEAS forces you to answer four questions: What counts as success? What will the agent interact with? How can it act? What can it perceive?

PEAS Framework โ€” Specifying any AI agent task
AI AGENT Perceive ยท Reason ยท Act P โ€” PERFORMANCE MEASURE What does success look like? E โ€” ENVIRONMENT What does the agent interact with? A โ€” ACTUATORS How does the agent act? S โ€” SENSORS What does the agent perceive?
AgentPerformance MeasureEnvironmentActuatorsSensors
Self-driving carSafe arrival, comfort, legal compliance, efficiencyRoads, traffic, pedestrians, weather, other vehiclesSteering wheel, brakes, accelerator, hornCameras, LiDAR, GPS, radar, odometer
Medical diagnosis AICorrect diagnosis, patient safety, cost efficiencyPatient data, records, lab results, imagingTreatment recommendation, flag urgent casesEMR systems, lab APIs, imaging APIs, notes
ChatGPT / LLM AgentHelpful, harmless, honest; task completionHuman conversation, web, code environmentText output, API calls, code execution, file writesText input, tool outputs, user feedback
Chess-playing AIWin games, minimise blundersChess board, opponent's movesMove selection (display update)Board state (current position)

Not all AI environments are created equal. The type of environment an agent operates in determines which algorithms are suitable, how much memory the agent needs, and how complex its decision-making must be. AIMA identifies six key dimensions along which environments vary.

PropertyDefinitionExample: Fully XExample: Partially XImpact on Agent Design
Observable vs PartialCan agent sense complete state?Chess (full board visible)Poker (opponent cards hidden)Partial โ†’ needs memory and belief states
Deterministic vs StochasticAre outcomes predictable?Chess (moves are certain)Self-driving (pedestrians random)Stochastic โ†’ needs probability reasoning
Episodic vs SequentialDo past actions affect future?Image classification (each independent)Chess / Conversation (history matters)Sequential โ†’ needs memory
Static vs DynamicDoes environment change while agent thinks?Crossword puzzleReal-time tradingDynamic โ†’ needs fast decisions
Discrete vs ContinuousFinite vs infinite states/actions?Chess (countable positions)Driving (continuous steering)Continuous โ†’ needs function approximation
Single vs Multi-agentOne or many agents?Sudoku solverMultiplayer games, stock marketMulti-agent โ†’ must model other agents
โš ๏ธ

Most Challenging: Partial + Stochastic

Real-world AI tasks are almost always both. A self-driving car can't see around corners (partial) and pedestrians behave unpredictably (stochastic). This combination is why AI for physical environments is so hard.

๐Ÿง 

Most Impactful: Sequential

If past matters for future โ€” which it does in almost all useful tasks โ€” the agent needs memory. This is why LLMs have context windows and why agents need long-term memory stores.

๐Ÿค

Emerging: Multi-agent

Modern AI increasingly involves multiple agents. LLM orchestration (LangGraph, AutoGen, CrewAI) is multi-agent by design. Critical for Domain 8: Agentic AI.

Why environment type matters for design: A fully observable, deterministic, episodic, discrete environment (like chess) can in principle be solved with a lookup table. A partially observable, stochastic, sequential, continuous, multi-agent environment (like autonomous driving) requires probabilistic reasoning, memory, planning, and robustness to uncertainty โ€” all simultaneously. The gap in engineering complexity is enormous.

Russell & Norvig define five progressively sophisticated agent architectures. Each adds a new capability layer. Understanding this hierarchy maps cleanly to the spectrum from a smoke detector to a fully autonomous AI assistant.

Agent Sophistication Spectrum โ€” From Reflex to Learning
SIMPLE REFLEX IF โ†’ THEN rules Thermostat MODEL-BASED + Internal state Robot vacuum GOAL-BASED + Search & planning Route planner UTILITY-BASED + Preferences RL agent LEARNING AGENT + Self-improvement GPT ยท Claude ยท LLM agents Simple Sophisticated โ†’ Increasing autonomy, flexibility & complexity โ†’
๐Ÿ”ฆ

โ‘  Simple Reflex Agent

Condition-action rules mapping current percepts directly to actions. No memory. No history. Fast and auditable but completely brittle โ€” fails whenever current percept doesn't capture all relevant state.

  • Thermostat, smoke detector, basic spam filter
  • Limitation: useless in partially observable environments
๐Ÿ—บ๏ธ

โ‘ก Model-Based Reflex Agent

Maintains an internal state โ€” a model of the unobservable parts of the world. Updates state based on percepts over time. Can handle partial observability that pure reflex agents cannot.

  • SLAM robot, Roomba tracking its path
  • Limitation: rules still don't reason about goals
๐ŸŽฏ

โ‘ข Goal-Based Agent

Has an explicit goal and searches for action sequences to achieve it. Uses search algorithms (A*, BFS) to plan multiple steps ahead. More flexible โ€” multiple paths to the same goal.

  • GPS navigation, STRIPS planner, theorem provers
  • Limitation: all goals binary โ€” doesn't handle tradeoffs
๐Ÿ“Š

โ‘ฃ Utility-Based Agent

Uses a utility function โ€” graded preferences between states, not just goal/no-goal. Maximises expected utility (probability ร— value). Handles stochastic outcomes naturally.

  • Recommendation systems, portfolio managers, RL agents
  • Foundation of decision theory and modern RLHF for LLMs
๐ŸŽ“

โ‘ค Learning Agent

Any agent type augmented with a learning component that modifies behaviour based on experience. Four internal components: performance element (selects actions), critic (evaluates against a standard), learning element (improves performance), problem generator (suggests exploratory actions). All modern AI systems are learning agents โ€” GPT learned from text, AlphaGo Zero learned from self-play.

Before an agent can search for a solution, it must formally define the problem. AIMA's five-component problem formulation is the standard method. It forces precision: what exactly is a "state"? What exactly is an "action"? What exactly is the "goal"?

Define Initial StateStarting configuration
Define ActionsAvailable transitions
Transition ModelState after action
Goal TestIs this a goal state?
Path CostCost to reach state
Search AlgorithmFind solution
State Space Search โ€” find the path from start to goal
S Start A B C D E G GOAL 1 3 Optimal: Sโ†’Aโ†’Cโ†’G
ElementDefinitionExample: 8-PuzzleExample: GPS Navigation
Initial stateStarting configurationRandom tile arrangementCurrent location
ActionsSet of possible moves from each stateSlide tile left, right, up, downTurn left/right, go straight
Transition modelResult of taking each action in each stateNew tile arrangement after slidingNew position after movement
Goal testDetermines if current state is goalTiles in order 1โ€“8, blank at bottom-rightCurrent location = destination
Path costNumeric cost of a pathNumber of moves takenDistance or estimated travel time
Why search-based AI was replaced: The 8-puzzle has ~181,000 reachable states. The 15-puzzle has ~1012. Chess: ~1046. Go: ~10170. Real-world AI problems are combinatorially intractable for exhaustive search โ€” which is precisely why machine learning (pattern recognition from data) displaced search as the dominant paradigm for perception, language, and unstructured reasoning tasks.
๐Ÿ“‹ Chapter 1.5 โ€” Key Takeaways
  • Agent = anything that perceives and acts. PEAS (Performance, Environment, Actuators, Sensors) formally specifies any agent task
  • Environments are: observable/partial, deterministic/stochastic, episodic/sequential, static/dynamic, discrete/continuous, single/multi-agent
  • Partially observable + stochastic = most real-world tasks. This is why AI for physical environments is so hard.
  • Five agent types: Reflex โ†’ Model-based โ†’ Goal-based โ†’ Utility-based โ†’ Learning โ€” each adds a capability layer
  • State-space search: define initial state, actions, transitions, goal test, path cost โ€” then search. Combinatorial explosion limits this approach at scale.
  • This framework directly maps to modern LLM agents (Domain 8): percepts = tool outputs + messages, actions = tool calls + text generation
1.6
Chapter 1.6 ยท Schools of Thought
Key Paradigms & Schools of Thought

AI is not one field โ€” it is a collection of competing intellectual traditions with different assumptions about what intelligence is and how to build it. Understanding the camps explains why researchers from different traditions argue past each other, and why hybrid approaches are gaining traction.

Connectionism is the paradigm that won. Inspired by the structure of the biological brain, it proposes that intelligence emerges from networks of simple connected units โ€” and that knowledge is stored not in explicit rules, but in the strengths of connections between units. A single neuron knows nothing; billions of them, connected with learned weights, produce language, vision, and reasoning.

The lineage runs from McCulloch & Pitts (1943) โ€” the first mathematical neuron โ€” through Rosenblatt's Perceptron (1957), through Rumelhart & Hinton's backpropagation (1986) which made multi-layer training practical, through LeCun's LeNet (1998) and AlexNet (2012), all the way to the Transformer (2017) and the LLM era. Every step was the same insight applied at larger scale with more data and compute.

โšก

Key Properties

  • Parallel distributed processing โ€” thousands of units compute simultaneously
  • Graceful degradation โ€” partial damage reduces performance gradually, not catastrophically
  • Learning from examples โ€” weights adapt through exposure, not programming
  • Statistical regularities โ€” finds patterns in data regardless of whether humans can articulate them
๐Ÿ“…

The Connectionist Timeline

  • 1943 โ€” McCulloch-Pitts neuron: first mathematical model
  • 1957 โ€” Rosenblatt Perceptron: first learning machine
  • 1986 โ€” Rumelhart & Hinton: backpropagation for multi-layer nets
  • 1998 โ€” LeCun's LeNet: CNNs on handwritten digits
  • 2012 โ€” AlexNet: GPU-trained deep CNN wins ImageNet
  • 2017 โ€” Transformer: attention replaces recurrence entirely
  • 2022+ โ€” LLMs: trillion-parameter connectionist systems
Connectionist Network โ€” Intelligence in weights, not rules
INPUT HIDDEN OUTPUT xโ‚ xโ‚‚ xโ‚ƒ xโ‚„ yโ‚ yโ‚‚ 0.8 -0.3 0.6 0.1 highlighted path other connections

For three decades the battle between symbolicism and connectionism was not a polite academic disagreement โ€” it was personal, bitter, and high-stakes. Minsky's famous dismissal of perceptrons (1969) killed neural network funding for a decade. Connectionists returned fire in 1986 with backpropagation. The symbolicism camp called neural networks "black box statistics". The connectionist camp called expert systems "fragile toy programs". Funding, careers, and the direction of the entire field were at stake.

Symbolicism (GOFAI)
Connectionism (Neural Networks)
  • Intelligence = symbol manipulation according to rules
  • Knowledge = explicit, human-readable rules and facts
  • Interpretable by design โ€” you can read the logic
  • Works with zero data โ€” rules are programmed in
  • Generalises by logical deduction from axioms
  • Can solve novel problems if rules cover them
  • Brittle โ€” fails hard outside the defined domain
  • Intelligence = statistical pattern matching in weights
  • Knowledge = distributed across millions of parameters
  • Black box โ€” learned weights are not human-readable
  • Requires large labelled datasets to train
  • Generalises by interpolation within training distribution
  • Struggles with truly novel, out-of-distribution problems
  • Robust โ€” degrades gracefully under noise and variation
๐Ÿ›๏ธ

Symbolicism's Best Argument

"Neural networks can't do compositional reasoning. They can't understand that 'the dog bit the man' is fundamentally different from 'the man bit the dog' โ€” they just see statistical co-occurrences. Language has recursive structure that requires rules, not statistics. Neural networks that can't count reliably can't be the basis of real intelligence."

๐Ÿง 

Connectionism's Best Argument

"Where are the rules for understanding a face? For recognising speech across different accents? For generating coherent prose? The rules are impossibly complex to write โ€” the only viable path is to learn from data. The brain itself is a neural network, not a rule engine. Evolution didn't write logic programs โ€” it tuned connection weights."

The debate was largely resolved by the empirical success of deep learning in the 2010s. But symbolicism's core insight โ€” that structured, compositional reasoning matters โ€” has returned in the form of neuro-symbolic AI, chain-of-thought prompting, and structured reasoning models like o1 and DeepSeek-R1.

DimensionSymbolicismConnectionismCurrent Status (2026)
Core claimIntelligence = symbol manipulation by rulesIntelligence emerges from connected simple unitsConnectionism dominant; hybrid gaining
Key figuresMcCarthy, Minsky, Newell, SimonRosenblatt, Rumelhart, Hinton, LeCun, BengioHinton, LeCun, Bengio โ€” 2018 Turing Award
StrengthsInterpretable, logically consistent, structuredLearns from data, robust, generalises wellBoth needed; neither sufficient alone
WeaknessesBrittle, doesn't scale, knowledge bottleneckBlack box, data-hungry, fails OODInterpretability & robustness unsolved
Modern formKnowledge graphs, formal verification, SMT solversLLMs, diffusion models, transformersNeuro-symbolic = active frontier

The third major paradigm takes a different starting point: the world is fundamentally uncertain, and any intelligent system must represent and reason about that uncertainty explicitly. Where symbolicism asks "what is true?", probabilistic AI asks "how confident am I, and how should new evidence update that confidence?" This is Bayes' theorem as the engine of intelligence: P(hypothesis | evidence) โˆ P(evidence | hypothesis) ร— P(hypothesis).

๐ŸŽฒ

Bayesian AI

Models uncertainty explicitly as probability distributions over hypotheses. Bayes' theorem: P(H|E) โˆ P(E|H) ร— P(H). New evidence updates your prior belief to a posterior. Principled โ€” but computing exact posteriors is often intractable, requiring approximations: MCMC, variational inference, Laplace approximation.

Applications: spam filtering, medical diagnosis, sensor fusion, A/B testing, causal inference

๐Ÿ”—

Probabilistic Graphical Models

Bayesian networks: directed acyclic graphs encoding conditional dependencies โ€” efficient inference via variable elimination. Hidden Markov Models (HMMs): hidden states with observable outputs โ€” dominated speech recognition 1980sโ€“2010s. Gaussian Processes: non-parametric Bayesian learning with uncertainty estimates.

Still used: robotics (SLAM), bioinformatics, probabilistic programming (Stan, PyMC, Pyro)

Bayesian Network โ€” explicit probabilistic dependencies
Season P(S) Rain P(R|Season) Sprinkler P(Sp|Season) Wet Grass P(W|R,Sp) Slip on Path P(Sl|W) P(R|S) P(Sp|S) P(Sl|W)

Probabilistic AI never dominated the way symbolicism or connectionism did, but it remains indispensable in specific contexts. Robotics relies heavily on Bayesian state estimation (Kalman filters, particle filters). A/B testing and causal inference are Bayesian at their core. Modern LLM calibration โ€” how well a model's confidence scores match actual accuracy โ€” is a probabilistic question. RLHF uses Bayesian reasoning about reward models. The probabilistic paradigm's contribution is the rigorous treatment of uncertainty that pure neural approaches often lack.

๐Ÿงฌ

How It Works

  • Maintain a population of candidate solutions
  • Evaluate each against a fitness function
  • Select the fittest for reproduction
  • Apply crossover (recombination) and mutation
  • Repeat โ€” no gradients required
  • Powerful where gradient descent fails: discontinuous objectives, black-box functions, combinatorial spaces
๐Ÿš€

Modern AI Applications

  • Genetic Algorithms โ€” solutions as bit-string chromosomes, selection pressure
  • Evolution Strategies (ES) โ€” optimise continuous parameters; OpenAI showed ES competes with RL backprop
  • NEAT โ€” neuroevolution: evolves both topology and weights of neural networks
  • Neural Architecture Search (NAS) โ€” discovers superior architectures (MobileNet, EfficientNet)
  • Hyperparameter optimisation โ€” black-box search over learning rates, depths, widths
Genetic Algorithms NEAT Neural Architecture Search Evolution Strategies Neuroevolution
๐Ÿค–

Rodney Brooks & the Core Thesis

  • MIT, late 1980s: "Intelligence without Representation" (1991)
  • Explicit world models are unnecessary โ€” and may be counterproductive
  • Subsumption architecture: layered behaviours, each suppressing lower ones
  • Behaviours: avoid obstacles โ†’ wander โ†’ follow walls โ†’ seek light
  • Result: surprisingly robust behaviour with no search, no world model
  • Contrast: GOFAI built maps of the world; Brooks robots just acted in it
๐ŸŒ

Why It Matters for Modern AI

  • Intelligence emerges from physical interaction with the world, not just computation
  • RT-2 (Google/DeepMind) โ€” LLM reasoning + embodied robot actions
  • ฯ€โ‚€ (Physical Intelligence) โ€” foundation model for physical manipulation
  • Symbol grounding (Ch 1.4) may require embodiment to solve
  • An AI that has felt heat and grasped objects has symbols grounded in experience
  • Pure text-trained LLMs lack this physical grounding entirely

The frontier of AI research in 2026 is increasingly defined by the attempt to combine the strengths of both paradigms: neural perception and fluency with symbolic precision and compositionality. Neither pure approach is sufficient โ€” neural networks hallucinate and fail at systematic reasoning; symbolic systems can't perceive raw inputs or handle uncertainty. The hybrid is not a compromise โ€” it is a new paradigm that can do things neither can do alone.

โš ๏ธ

Why Pure Neural Struggles

  • Hallucination โ€” generates confident falsehoods with no symbolic grounding
  • Arithmetic errors โ€” counting and algebra are unreliable in raw LLMs
  • Logical inconsistency โ€” contradicts itself across long contexts
  • Rule-following failures โ€” can't reliably follow strict constraints
  • OOD brittleness โ€” fails on distributions far from training data
๐Ÿ”ฌ

Why Neuro-Symbolic Is the Frontier

  • Tool-using LLMs โ€” symbolic precision (calculator, code) + neural fluency
  • Chain-of-thought โ€” structured reasoning steps in natural language
  • AlphaGo / AlphaZero โ€” neural value/policy network + MCTS symbolic search
  • AlphaGeometry (2024) โ€” neural proof-step generator + symbolic verifier, solves IMO problems
  • Knowledge Graph + LLM โ€” structured facts + generative reasoning (RAG)
The Neuro-Symbolic Spectrum โ€” most frontier AI sits in the middle
Pure Symbolic Prolog ยท Expert Systems Ontologies ยท FOL Neuro-Symbolic AlphaGo ยท Chain-of-Thought Tool-using LLMs ยท AlphaFold ยท RAG Pure Neural Vanilla GPT ยท Image Classifiers Speech Recognition ยท CNNs Probabilistic Graphical Models Deep RL (learned + planned) โ† More interpretable, structured ยท ยท ยท More flexible, data-driven โ†’

The five most important neuro-symbolic integrations in production AI today:

  • โ‘ Tool use โ€” LLMs call Python interpreters, calculators, and search APIs for symbolic precision. Neural fluency + symbolic correctness.
  • โ‘กChain-of-thought prompting โ€” forcing explicit intermediate reasoning steps mimics symbolic deduction chains. Each step is verifiable.
  • โ‘ขAlphaGo / AlphaZero โ€” the neural value network supplies learned intuition; MCTS provides principled symbolic tree search. Neither alone reaches superhuman level.
  • โ‘ฃRAG (Retrieval-Augmented Generation) โ€” a knowledge graph or vector store (symbolic) is queried by a neural LLM. Grounds generation in verifiable facts.
  • โ‘คReasoning models (o1, DeepSeek-R1) โ€” extended internal "thinking" with verifiable intermediate steps before producing output. The most direct implementation of neuro-symbolic reasoning in frontier LLMs.
๐Ÿ“‹ Chapter 1.6 โ€” Key Takeaways
  • Connectionism: knowledge stored in weights, learned from data โ€” robust, scalable, but opaque. The dominant paradigm since 2012.
  • Symbolicism: knowledge in explicit rules โ€” interpretable, compositional, but brittle and unable to scale to real-world complexity
  • Probabilistic AI: uncertainty as a first-class citizen โ€” Bayesian networks, HMMs, still essential in robotics and calibration
  • Evolutionary computation: selection + mutation โ€” used in NAS, hyperparameter optimisation, and black-box RL
  • Embodied AI: intelligence emerges from physical interaction โ€” motivates robotics frontier and addresses symbol grounding
  • Modern frontier is neuro-symbolic: neural perception + symbolic reasoning โ€” chain-of-thought, tool use, AlphaGo, AlphaGeometry
1.7
Chapter 1.7 ยท Current Landscape
The AI Landscape Today & Tomorrow

This chapter connects Domain 1's history and theory to the present, and gives you a map of where each subsequent domain fits. AI in 2026 is defined by foundation models as the new default paradigm โ€” but with hard limitations that motivate everything that follows in this curriculum.

The most important structural shift in AI since 2020 is the emergence of Foundation Models โ€” large models pre-trained on broad, internet-scale data that can be adapted to almost any downstream task via prompting, fine-tuning, or RLHF. The old paradigm: train a separate model for each task (one for translation, one for summarisation, one for classification). The new paradigm: one model does all of them, often better than the specialised predecessors.

This shift was driven by neural scaling laws (Kaplan et al., 2020): performance on downstream tasks improves predictably as a power-law function of compute, parameters, and data โ€” with no sign of diminishing returns until very recently. The implication: more scale reliably buys more capability, making the economics of frontier model training self-reinforcing.

The Foundation Model Paradigm โ€” one model, many capabilities
FOUNDATION MODEL Pre-trained on: text ยท code ยท images audio ยท video (trillions of tokens) Language Tasks QA ยท summarisation translation ยท coding GPT-4o ยท Claude ยท Gemini Vision Tasks classification ยท VQA detection ยท generation GPT-4V ยท DALL-E ยท Gemini Specialised Domains medicine ยท law ยท science finance ยท education AlphaFold ยท Med-PaLM ยท Harvey Agentic Tasks tool use ยท planning multi-step reasoning AutoGPT ยท Claude Agents ยท o3
๐Ÿ“

Scale

Billions to trillions of parameters. Trained on internet-scale data across months on thousands of GPUs. GPT-4 estimated $100M+ to train. Next-generation frontier models: estimated $1B+. The compute requirements double every 6โ€“12 months.

โœจ

Emergence

Capabilities that weren't explicitly trained emerge at scale โ€” arithmetic, code generation, multi-step reasoning, in-context learning. Below a capability threshold: zero competence. Above it: sudden, surprising ability. Emergent capabilities are scale's most striking phenomenon.

๐Ÿ”ง

Adaptability

One model adapts to hundreds of tasks via prompting alone. No task-specific training required for most applications. Fine-tuning (LoRA, QLoRA) and RLHF allow further specialisation when needed. The pre-train โ†’ adapt paradigm replaced train-from-scratch.

AI capability is not uniform across tasks. Understanding where AI already surpasses humans, where it approaches parity, and where it still falls short is essential for calibrating realistic expectations โ€” and for knowing which research problems remain open.

AI Capability Map โ€” where AI excels, approaches, and falls short
PERCEPTION Computer vision ยท speech recognition ยท sensor fusion ยท document understanding โœ… SURPASSED LANGUAGE Text generation ยท translation ยท summarisation ยท QA ยท dialogue ยท code โœ… SURPASSED REASONING Logical inference ยท mathematical reasoning ยท planning ยท causal reasoning โ‰ˆ APPROACHING GENERATION Image/video synthesis ยท code generation ยท music ยท 3D objects ยท multimodal โ‰ˆ APPROACHING ACTION Tool use ยท web browsing ยท code execution ยท physical robot control ยท long-horizon tasks โ—‹ BEHIND โœ… AI surpassed human โ‰ˆ Approaching parity โ—‹ Still behind โ† Basic ยท ยท ยท Advanced โ†’
CapabilityBenchmarkHuman ScoreBest AI (2026)Status
Image ClassificationImageNet Top-5~95%99%+โœ… Surpassed
Speech RecognitionLibriSpeech WER~5% WER2โ€“4% WERโœ… Surpassed
Reading ComprehensionSQuAD 2.089.5 F193+ F1โœ… Surpassed
Professional KnowledgeMMLU89%87โ€“90%โ‰ˆ At parity
Code GenerationHumanEval~85% (experts)90%+โ‰ˆ At parity
Mathematical ReasoningMATH (competition)~60% (AMC)70โ€“85%โ‰ˆ Approaching
Common-Sense (physical)PIQA~95%85โ€“90%โš  Still behind
Long-horizon PlanningNovel open tasksHighLowโ€“mediumโš  Still behind
๐Ÿ”ด

OpenAI

GPT-4o, o1, o3 series. Dominant commercial position. ChatGPT ~200M+ active users. $13B Microsoft partnership and deep Azure integration. Reasoning models (o1/o3) introduced chain-of-thought as a first-class capability. Controversial governance restructuring in 2024.

๐Ÿ”ต

Anthropic

Claude 3.5 Sonnet / Opus. Safety-focused, Constitutional AI alignment. $4B Amazon investment + AWS deployment. Long context (200K tokens). Claude 3.7 introduced extended thinking. Preferred by enterprises with compliance requirements.

๐ŸŸข

Google DeepMind

Gemini 1.5 Pro / Ultra. Deep integration with Google Search and Workspace. TPU hardware advantage. AlphaFold 2 solved protein folding. AlphaCode, Gemini robotics (RT-2). Gemini 2.0 Flash: fast multimodal at scale.

๐ŸŸก

Meta AI

LLaMA 3 (open weights: 8B, 70B, 405B). Democratising AI development. The LLaMA ecosystem powers most open-source fine-tuning, quantisation, and deployment. Fuelled the open-source explosion that now competes with closed models.

๐ŸŸฃ

Mistral AI

European AI champion. Mistral Large, Codestral, Mixtral (mixture-of-experts). Efficient models with open weights. Positioned as privacy-first alternative. GDPR-aligned for European enterprise deployment.

๐ŸŒ

Open Source Community

Hugging Face ecosystem (models, datasets, Spaces). GGUF/llama.cpp local inference. LoRA/QLoRA fine-tuning. DeepSeek-R1 (China, open weights). Qwen (Alibaba). Phi-3 (Microsoft). Democratised deployment without API dependency.

DimensionClosed SourceOpen Source
ExamplesGPT-4o, Claude 3.5, Gemini 1.5LLaMA 3, Mistral, DeepSeek-R1, Qwen
PerformanceHighest capability at frontierIncreasingly competitive (within 10โ€“20%)
CostPer-token pricing ($0.001โ€“$0.10/1K tokens)Infrastructure cost only โ€” near zero at scale
PrivacyData sent to vendor serversOn-premise possible โ€” full data control
CustomisationLimited (fine-tune API, system prompts)Full weight access โ€” fine-tune anything
SafetyVendor-managed RLHF alignmentCommunity-managed โ€” variable quality
TransparencyMinimal (model cards, no architecture details)Weights + often training details available

Benchmarks are the field's primary mechanism for measuring progress โ€” and its primary mechanism for self-deception. The benchmark lifecycle is predictable: a new benchmark is created to test a genuine capability gap โ†’ models improve rapidly โ†’ benchmark saturates โ†’ researchers create a harder benchmark. The cycle repeats every 12โ€“24 months.

Goodhart's Law in AI: "When a measure becomes a target, it ceases to be a good measure." AI benchmarks are routinely gamed โ€” through training data contamination, fine-tuning on test distributions, or optimising prompts for specific test formats. Benchmark saturation does not mean a capability is solved. ARC-AGI was designed by Franรงois Chollet specifically to resist shortcut-learning โ€” it remains the hardest general reasoning evaluation as of 2026.

BenchmarkDomainWhat It TestsHuman BaselineStatus
ImageNetVision1000-class image classification~95%Saturated โ€” AI >99%
GLUE / SuperGLUENLPLanguage understanding tasks87% / 89%Saturated โ€” replaced
MMLUKnowledge57 academic subjects (multiple choice)89%At parity โ€” gameable
HumanEvalCodingPython function generation from docstrings85% (experts)Near parity (90%+)
MATHMathematicsCompetition math (AMC/AIME level)60% (AMC students)AI ~70โ€“85%
ARC-AGIReasoningNovel visual pattern abstraction85%AI ~40% โ€” hardest eval
BIG-Bench HardDiverse23 reasoning-heavy tasks~65%AI 75%+ approaching
GPQA (Diamond)SciencePhD-level biology, chemistry, physics65โ€“80%AI 60โ€“70% approaching
โš ๏ธ

How Benchmarks Get Gamed

  • Data contamination: test set data appears in model training corpus
  • Distribution fine-tuning: model trained on benchmark training split, tested on test split
  • Prompt engineering: optimising prompts for specific test formats inflates scores
  • Memorisation: model recalls answers rather than reasoning to them
๐ŸŽฏ

What Makes a Good Benchmark

  • Novel problems: not solvable by pattern-matching training data
  • Human-verifiable: ground truth is unambiguous
  • Diverse: many task types so no single trick wins
  • Dynamic: can be updated as old tasks saturate (e.g. ARC-AGI 2)

Despite the extraordinary progress of the last decade, fundamental problems remain unsolved. These are not engineering challenges awaiting more compute โ€” they are conceptual gaps that may require new paradigms to address. Understanding them prevents over-hyping current capabilities and points toward where research is most needed.

๐Ÿ’ฌ

Hallucination & Factual Reliability

LLMs generate confident falsehoods. Retrieval-augmented approaches reduce but don't solve the problem. Root cause: models are trained to produce fluent, plausible text โ€” not verified facts. A model that reliably says "I don't know" when it doesn't know would be enormously valuable. Calibrated uncertainty is an open research problem.

๐Ÿงฉ

Common Sense & Physical Reasoning

A 2-year-old knows a dropped ball falls. AI still fails simple physical scenarios. LLMs absorbed much commonsense from text but miss the causal world model behind it. The Winogrande benchmark reveals systematic errors on pronoun resolution humans find trivial. Physical reasoning requires causal world models โ€” not statistical text patterns.

๐Ÿ—“๏ธ

Long-Horizon Planning

Current agents handle ~10โ€“50 step tasks reliably. Real-world projects span hundreds of interdependent steps over days or weeks. Errors compound: one wrong action early invalidates all downstream planning. Agents also lack persistent memory across sessions and suffer context window limits. Autonomous multi-day task completion remains out of reach.

๐Ÿ”€

Causal Inference

Current AI is fundamentally correlational. "Countries with more hospitals have more disease" โ€” a correlational model concludes hospitals cause disease. Judea Pearl's causal hierarchy (association โ†’ intervention โ†’ counterfactual) identifies exactly what's missing. Moving from correlation to causation is essential for reliable decision-making in medicine, policy, and science.

๐Ÿ“‰

Sample Efficiency

A child learns "dog" from ~10 examples. GPT-4 needed hundreds of billions of tokens to achieve comparable breadth. Humans generalise from far fewer examples via strong inductive biases built by evolution. Few-shot meta-learning addresses this but current models still require vastly more data than humans for comparable generalisation.

๐ŸŽฏ

Robustness & Distribution Shift

Models trained on one distribution fail when deployed conditions differ. A self-driving model trained in California may fail catastrophically in snow. Adversarial examples โ€” imperceptible perturbations โ€” fool classifiers with >99% confidence. Consistent performance across distribution shift remains unsolved and is critical for safety-critical deployment.

Open Problems in AI โ€” The frontier challenges of 2024โ€“2026
Hallucination Confident falsehoods from fluency-trained models โ—โ—โ— Common Sense Physical reasoning without causal world models โ—โ—โ— Long-Horizon Planning Multi-step tasks over hours or days โ—โ—โ— Causal Inference Correlation โ‰  causation โ€” Pearl's do-calculus needed โ—โ— Sample Efficiency AI needs 1000ร— more data than humans to learn โ—โ— Robustness Fails on distribution shift; adversarial examples โ—โ— โ—โ—โ— very hard ยท โ—โ— hard ยท Problems toward which current research is actively converging
๐Ÿ“‹ Chapter 1.7 โ€” Key Takeaways
  • Foundation Models are the new paradigm โ€” one model pre-trained at scale, adapted to many tasks via prompting or fine-tuning
  • Neural scaling laws (Kaplan et al., 2020): performance improves predictably with more compute, parameters, and data โ€” no ceiling yet found
  • AI already surpasses humans in perception and language tasks; approaches parity in reasoning and code; still behind in physical commonsense and long-horizon planning
  • The AI ecosystem is bifurcating: closed frontier (GPT-4o, Claude, Gemini) vs open weights (LLaMA 3, Mistral, DeepSeek) โ€” both converging in capability
  • Benchmarks are routinely gamed โ€” saturation โ‰  capability solved. ARC-AGI remains the hardest cheat-resistant general reasoning eval
  • Hard open problems โ€” hallucination, common-sense reasoning, long-horizon planning, causal inference, sample efficiency โ€” are conceptual gaps, not just engineering ones
๐ŸŽ“ Domain 1 Complete โ€” Foundations of AI
  • Ch 1.1 โ€” AI = optimisation toward a goal. Four capabilities: Perceive, Reason, Learn, Act. All today's AI is ANI โ€” narrow, task-specific, impressive in domain, brittle outside it.
  • Ch 1.2 โ€” Two AI winters from data/compute/algorithm gaps. AlexNet (2012) + Transformer (2017) are the two true inflection points. ChatGPT (2022) = 100M users in 60 days.
  • Ch 1.3 โ€” GOFAI = symbol manipulation. Expert systems worked commercially (XCON, MYCIN) but hit the knowledge bottleneck. GOFAI failure proved intelligence must be learned, not encoded.
  • Ch 1.4 โ€” Chinese Room: syntax โ‰  semantics. Symbol grounding problem motivates multimodal and embodied AI. Hard Problem of consciousness remains unresolved for biological and artificial systems.
  • Ch 1.5 โ€” PEAS framework describes any agent. Five agent types: Reflex โ†’ Model-based โ†’ Goal-based โ†’ Utility โ†’ Learning. This maps directly to modern LLM agents in Domain 8.
  • Ch 1.6 โ€” Connectionism won empirically. The frontier is neuro-symbolic: neural perception + symbolic reasoning โ€” chain-of-thought, tool use, AlphaGo, AlphaGeometry.
  • Ch 1.7 โ€” Foundation models = the new paradigm. One model, many tasks. Open problems: hallucination, common-sense, long-horizon planning, causal inference remain unsolved.

Domain 1 gave you the vocabulary and mental models. Every subsequent domain takes one part of this picture and goes deep. Domain 2 gives you the mathematics. Domain 3 the algorithms. Domain 4 the architectures. The pieces will connect โ€” keep going.