Foundations of Artificial Intelligence
What AI actually is, where it came from, why symbolic AI failed, the philosophical debates that matter today, the rational agent framework, competing paradigms, and the state of the field.
Most AI discourse fails at the first step: no crisp definition. AI is not magic, not a single algorithm, and not science fiction. It is the science and engineering of building systems that exhibit behaviour we would call intelligent if a human exhibited it β and that definition is deliberately provisional, because intelligence itself resists definition.
Biological Intelligence
- Embodied β tied to a body with survival pressures
- Developed through evolution over millions of years
- Energy-efficient: 20 watts powers the human brain
- Generalises from very few examples (few-shot by default)
- Handles open-ended, ambiguous, novel situations naturally
Machine Intelligence
- Disembodied β no physical survival pressure
- Built through training on human-generated data
- Energy-hungry: GPT-4 training cost ~$100M in compute
- Requires millionsβbillions of examples to learn patterns
- Brittle outside training distribution; exceptional at defined tasks
- Ability to learn from experience
- Ability to solve novel problems
- Ability to understand and generate language
- Ability to reason under uncertainty
| Term | Definition | Subset of | Example |
|---|---|---|---|
| Artificial Intelligence | Any technique enabling machines to mimic aspects of human cognition | β (broadest) | Chess engines, expert systems, LLMs, robotics |
| Machine Learning | AI that improves by learning from data rather than explicit rules | AI | Spam filter, fraud detection, recommendation systems |
| Deep Learning | ML using multi-layer neural networks to learn hierarchical representations | ML | GPT-4, image classifiers, speech recognition |
| Generative AI | Models that generate new content (text, images, audio, code) plausibly like training data | Deep Learning | ChatGPT, DALL-E, Stable Diffusion, Sora |
| Data Science | Interdisciplinary field combining statistics, ML, and domain expertise to extract insights from data | Overlaps AI & Statistics | Business dashboards, A/B testing, churn analysis |
Common misuse: "We use AI" usually means "we use ML" and often specifically means "we use a trained model." Precision matters β especially evaluating vendor claims. Ask: Is it rule-based? Trained? On what data? Measured by what metric?
ANI β Narrow AI
Excels at one specific task. All commercially deployed AI today is ANI β including GPT-4. Examples: chess engines, image classifiers, recommendation algorithms, language models.
STATUS: Present reality (2026)
AGI β General AI
Human-level cognitive ability across any intellectual domain. Can learn any task a human can, without domain-specific training. No AGI exists. Timelines debated: 5β50+ years by leading researchers.
STATUS: Research goal
ASI β Super AI
Surpasses human intelligence in every domain β creativity, scientific discovery, social intelligence. Purely theoretical. Motivates alignment research and existential risk discourse (Domain 10).
STATUS: Theoretical
| Myth | Reality |
|---|---|
| AI "understands" things | AI processes statistical patterns in training data. Whether this constitutes understanding is a genuine philosophical debate (Ch 1.4) β but current systems don't understand in the human sense. |
| AI is deterministic | Most modern AI uses probabilistic sampling. The same prompt produces different outputs. Temperature and sampling parameters control randomness. |
| AI develops goals on its own | AI optimises for its training objective. It doesn't spontaneously form desires. Alignment failures happen when the objective doesn't match human intent β not through "waking up." |
| More data always = better AI | Data quality, labelling accuracy, and relevance matter more than volume. 1B clean samples often outperform 10B noisy ones. |
| AI replaces human intelligence wholesale | Current AI replaces specific tasks within jobs, not entire jobs at once. Displacement patterns are complex and highly uneven across domains. |
AI history is a story of alternating euphoria and collapse. Understanding why each era gave way to the next β not just when β is the best inoculation against misreading hype today. The pattern: overpromise β underfund β winter β unexpected breakthrough β repeat.
Leibniz & Boole (1600sβ1800s)
Leibniz dreamed of a calculus ratiocinator β a machine that could calculate truth from symbols. Boole formalised logic as algebra (1854), creating the mathematical foundation for all computation that followed.
Ada Lovelace (1843)
Writing notes on Babbage's Analytical Engine, Lovelace described how the machine could be programmed to compose music β arguably the first vision of general-purpose computing. She also articulated its limits: it can only do what we tell it. The tension between capability and instruction persists today.
Early Mechanical Calculators
Pascal's Pascaline (1642), Babbage's Difference Engine (1822) β mechanical attempts to automate calculation. They mechanised arithmetic but not reasoning. The distinction matters: calculation β intelligence.
Proposed the first mathematical model of a neuron β a binary threshold unit that fires when inputs exceed a threshold. Showed that networks of these units could compute any logical proposition. Direct ancestor of all modern neural networks.
Proposed the Imitation Game as a pragmatic test for machine intelligence. Anticipated and answered nine objections to machine thought. Introduced the "child machine" β a machine that learns rather than being pre-programmed. The first articulation of what we now call machine learning.
The proposal: "Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it." McCarthy coined "Artificial Intelligence." Optimism proved premature β but the research agenda set here dominated the field for 20 years.
Early Successes
- Logic Theorist (1955): proved 38 of 52 theorems from Principia Mathematica
- GPS β General Problem Solver (1957): domain-agnostic search; separated problem description from problem-solving method
- ELIZA (1966): Weizenbaum's chatbot simulated a Rogerian therapist. Users formed genuine emotional connections β a warning about anthropomorphism that still applies to LLMs today.
- SHRDLU (1970): natural language system that manipulated blocks in a simulated world. Impressive in its toy domain; completely brittle outside it.
Seeds of the First Winter
- Minsky & Papert (1969) proved perceptrons can't learn XOR β stalling neural network research for a decade
- ALPAC report (1966): machine translation was "twice as expensive and twice as slow" as human translation
- Combinatorial explosion: real-world problems have exponentially more states than toy problems
- The "Potemkin village" problem: early demos were cherry-picked; the gap between demos and general capability was enormous
| Winter | Trigger | What Collapsed | Root Cause | What Survived |
|---|---|---|---|---|
| First (1974β80) | Lighthill Report (UK, 1973); DARPA cuts | General-purpose AI, machine translation, symbolic reasoning | Combinatorial explosion; overpromised timelines to funders | Specialised systems; early expert system research; Prolog |
| Second (1987β93) | Expert system maintenance failures; Lisp machine market collapse | Commercial expert systems; DARPA Strategic Computing Program | Expert systems too brittle and expensive to maintain at real-world scale | Backpropagation (1986 rediscovery); statistical ML; RL foundations |
Why Expert Systems Failed
- Knowledge acquisition bottleneck: experts couldn't articulate their own tacit knowledge
- Brittleness: outside their narrow domain, systems failed completely β no graceful degradation
- Maintenance cost: real-world rule sets grew to thousands of rules that became unmaintainable
- No learning: every change required manual update by knowledge engineers
The Lighthill Report (1973)
Sir James Lighthill's report to the UK Science Research Council concluded that AI had failed to live up to its "grandiose objectives" due to the combinatorial explosion problem. The UK effectively abandoned AI funding for a decade. The US followed.
Lesson: Overpromising specific timelines to funders is more damaging than the research being wrong.
Backprop Rediscovered (1986)
Rumelhart, Hinton & Williams published the backpropagation algorithm. Earlier work by Werbos (1974) had gone unnoticed. Training multi-layer networks became practical. The door to deep learning opened β but GPUs weren't ready yet.
Statistical ML Rise (1990s)
SVMs (Vapnik, 1995) provided strong theoretical guarantees. Hidden Markov Models dominated speech recognition. Probabilistic graphical models matured. ML became rigorous and mathematical, focused on generalization theory rather than AI folklore.
Key Milestones
- Deep Blue beats Kasparov (1997)
- LSTM networks (Hochreiter & Schmidhuber, 1997)
- LeNet-5 for digit recognition (LeCun, 1998)
- Netflix Prize β collaborative filtering (2006β09)
- ImageNet dataset created (Fei-Fei Li, 2009)
Three factors converged in 2012: massive datasets (ImageNet: 1.2M labelled images), GPU computing (CUDA parallelism), and algorithmic improvements (ReLU, dropout, batch normalisation). None alone was sufficient; all three together were transformative.
Won ImageNet with 15.3% top-5 error vs. 26.2% for the runner-up β an 11-point gap that shocked the computer vision community. Trained on 2 GTX 580 GPUs in 5 days. Every major research lab pivoted to neural networks within 18 months.
The Transformer architecture replaces recurrent networks for sequence modelling. Self-attention enables full context access without sequential bottlenecks. Direct ancestor of BERT, GPT, T5, LLaMA, and every modern LLM.
100 million users in 60 days β fastest consumer technology adoption in history. Demonstrated that RLHF (Reinforcement Learning from Human Feedback) could align large language models to be genuinely helpful and safe. Moved AI from research into mainstream public discourse permanently.
| Year | Event | Why It Matters |
|---|---|---|
| 2012 | AlexNet wins ImageNet | Deep learning proven at scale; GPU compute validated as the path forward |
| 2014 | GANs (Goodfellow) | Generative models can produce realistic images; generative AI era begins |
| 2016 | AlphaGo defeats Lee Sedol | RL + deep learning = superhuman Go; 10 years ahead of schedule |
| 2017 | Transformer architecture | Replaces RNNs; enables parallelism; foundation of all modern LLMs |
| 2018 | BERT & GPT-1 | Pre-train β fine-tune paradigm established for NLP |
| 2020 | GPT-3 (175B params) | Few-shot learning at scale; foundation model era begins |
| 2021 | DALL-E, Codex, AlphaFold 2 | Multimodal generation; protein structure solved after 50-year open problem |
| 2022 | ChatGPT; Stable Diffusion | AI goes mainstream; open-source generative models democratise access |
| 2023β24 | GPT-4, Claude 3, Gemini, LLaMA | Multimodal capability; open weights; commercial proliferation |
| 2025β26 | Agentic AI; reasoning models | o1, o3, DeepSeek-R1, Claude 3.7; agents handle full multi-step workflows |
Symbolic AI is commonly omitted from modern documentation. This is a mistake. Understanding why symbolic AI dominated for 30 years β and exactly why it failed to scale β is what explains why neural networks won. More importantly, it reveals what neural networks still cannot do.
| Dimension | Symbolic (GOFAI) | Sub-Symbolic (Connectionist) |
|---|---|---|
| Knowledge representation | Explicit symbols, rules, logic β human-readable | Distributed across billions of numerical weights β not human-readable |
| Reasoning | Formal inference (deduction, induction, abduction) | Pattern interpolation from training data |
| Interpretability | Fully interpretable β you can read and audit the rules | Black box β weights are not meaningfully inspectable |
| Learning | Difficult to learn from raw data; requires manual knowledge encoding | Learns directly from raw data; scales with data and compute |
| Generalisation | Brittle outside defined rule coverage | Generalises within training distribution; fails on distribution shift |
| Current form | Knowledge graphs, ontologies, formal methods, SAT solvers | LLMs, CNNs, diffusion models, transformers |
Nodes = concepts Β· Edges = relationships Β· Inheritance flows along is-a links
Semantic Networks
Graph-based: nodes are concepts, edges are relationships. Socrates β is-a β Human β is-a β Mortal. Allows inheritance β Socrates automatically inherits all human properties. Precursor to modern knowledge graphs (Wikidata, Google Knowledge Graph).
Frames (Minsky, 1974)
Structured representations with named slots and default values. A "Restaurant" frame has slots: name, type, price_range, menu, hours. When visiting a new restaurant, you fill known slots and use defaults for unknown ones. Directly influenced object-oriented programming.
Ontologies
Formal specification of concepts and relationships within a domain. The Cyc project (Lenat, 1984βpresent) attempted to encode all human common-sense knowledge β 25M+ rules over 40 years. Still operational but not competitive with neural approaches on most tasks.
Description Logics
Formal languages for ontologies with decidable reasoning. Basis of OWL (Web Ontology Language) β standard for semantic web and knowledge graphs. Allows automated consistency checking and inference over structured knowledge.
MYCIN (Stanford, 1972)
Diagnosed bacterial infections and recommended antibiotics using 600 production rules and certainty factors. Outperformed medical students and matched specialists in controlled tests. Never deployed clinically β not due to technical failure, but to liability concerns.
AND organism-morphology = rod
AND patient-compromised-host = true
THEN organism = pseudomonas [CF: 0.6]
DENDRAL (Stanford, 1965)
Identified organic molecules from mass spectrometry data. First expert system with real scientific impact β published findings human chemists hadn't noticed. Proved that narrow domain expertise could be encoded computationally and productively.
Forward chaining (data-driven): start from known facts, apply rules to derive new facts until goal is reached. Used in production rule systems. Backward chaining (goal-driven): start from the goal, work backwards to find rules that could prove it. Used in Prolog and MYCIN. Backward chaining is more efficient when the goal is specific and the search space is large.
Propositional Logic
Atomic propositions connected by β§ (AND), β¨ (OR), Β¬ (NOT), β (implies). Truth-functional β truth of compound depends only on truth of components. Decidable but lacks expressive power: no variables, quantifiers, or relations between objects.
P β Q // if it rains, ground is wet
P // it rains (given)
β΄ Q // modus ponens: ground is wet
First-Order Logic (FOL)
Extends propositional logic with objects, predicates, and quantifiers (β for all, β there exists). Can represent most mathematical knowledge. Semi-decidable: you can prove theorems but not always disprove them in finite time.
Human(Socrates) // Socrates is human
β΄ Mortal(Socrates) // universal instantiation
| Algorithm | Strategy | Complete? | Optimal? | Time | Space |
|---|---|---|---|---|---|
| BFS | Expand shallowest nodes first | β Yes | β Yes (unit cost) | O(bd) | O(bd) |
| DFS | Expand deepest nodes first | β No | β No | O(bm) | O(bm) |
| IDDFS | DFS with increasing depth limit | β Yes | β Yes | O(bd) | O(bd) |
| Uniform Cost | Expand lowest-cost node first | β Yes | β Yes (admissible h) | O(bC/Ξ΅) | O(bC/Ξ΅) |
| Greedy Best-First | Expand node closest to goal (h only) | β No | β No | O(bm) | O(bm) |
| A* | f(n) = g(n) + h(n) β cost + heuristic | β Yes | β Yes (admissible h) | O(bd) | O(bd) |
Admissibility condition: h(n) must never overestimate actual cost. If admissible, A* is guaranteed to find the optimal path. A better heuristic means fewer nodes expanded β the difference between seconds and hours on large graphs.
Minimax & Alpha-Beta: Two-player zero-sum games use Minimax (maximise your score, opponent minimises). Alpha-Beta pruning eliminates branches that cannot affect the final decision, reducing the effective branching factor from b to βb β roughly doubling searchable depth for the same compute.
These aren't abstract puzzles. The Turing Test, Chinese Room, and symbol grounding problem are actively debated in the context of LLMs right now. They directly motivate alignment research, interpretability, and legal questions around AI. Don't treat this chapter as optional theory.
Three participants: a human interrogator (C), a human (A), and a machine (B). The interrogator communicates only via text. The machine's goal is to convince the interrogator it is human. Turing proposed this not as a definition of intelligence, but as a pragmatic operational test β if a machine can consistently fool a human interrogator, we have sufficient reason to call it intelligent, whatever "intelligent" means.
| Critique | Why It Matters Today |
|---|---|
| Tests behaviour, not cognition β passes without understanding | GPT-4 informally passes casual Turing tests. The understanding question remains fully open. |
| Anthropocentric β measures human-likeness, not intelligence per se | A genuinely intelligent alien system might fail. The test conflates intelligence with human-mimicry. |
| Humans can fail it too β adversarial settings trip humans up | CAPTCHA systems exploit this. Humans fail certain adversarial Turing variants more than LLMs do. |
| Tests only conversational fluency, not reasoning or knowledge | LLMs excel at fluency but fail structured reasoning and factual tests at the same time. |
Imagine a person locked in a room receiving Chinese characters through a slot. They have a rulebook (in English) telling them how to manipulate the symbols and which characters to return. From outside, the room appears to understand Chinese β it passes any Turing test for Chinese comprehension. But the person inside understands nothing. They're just manipulating symbols according to rules.
Searle's conclusion: Programs manipulate syntax (formal symbol structures). Understanding requires semantics (meaning). Syntax alone is not sufficient for semantics. Therefore no program β regardless of performance β can have genuine understanding.
| Counter-Argument | Searle's Reply | Current Status |
|---|---|---|
| Systems Reply: The room as a whole understands Chinese, even if no individual part does | Imagine the person memorises the entire rulebook β they still understand nothing. Systems don't have understanding any more than their parts. | Debated β many find systems reply compelling |
| Brain Simulator Reply: If the program simulates individual neurons, would it understand? | Simulating neurons is not the same as having neurons. A simulated storm doesn't make you wet. | Debated β depends on theory of consciousness |
| Robot Reply: Connect the system to sensors and actuators β grounded meaning would emerge | Adding I/O is just more symbol manipulation. The grounding problem remains. | Partial concession β embodiment may matter |
The Hard Problem of Consciousness
Chalmers (1995) distinguishes "easy problems" (explaining cognitive functions: attention, memory, behaviour β all tractable in principle) from the "hard problem": why is there subjective experience at all? Why does information processing feel like something from the inside?
Even a complete functional account of the brain wouldn't explain why there's a "what it's like to be" that brain. This is the central unsolved problem in philosophy of mind β and applies directly to AI sentience claims.
Integrated Information Theory (IIT)
Tononi (2004): consciousness corresponds to integrated information (Ξ¦, "phi"). A system is conscious to the degree its parts share information in an irreducibly integrated way. Higher Ξ¦ = more conscious.
Implication: feedforward networks (including transformers, which have no recurrent loops) have Ξ¦ β 0. If IIT is correct, current LLMs are not conscious β not even slightly.
Multiple Intelligences (Gardner)
Eight distinct intelligences: linguistic, logical-mathematical, spatial, musical, bodily-kinaesthetic, interpersonal, intrapersonal, naturalistic. AI today dominates the first two; is largely absent from the last five.
Embodied Cognition
Intelligence is shaped by having a body that interacts with the world. Physical AI (robotics + LLMs) is the active frontier precisely because disembodied language models lack grounding in physical causality.
Cognitive Architectures
ACT-R (Anderson) and SOAR (Laird, Newell) are computational models of human cognition with procedural memory, declarative memory, and attention modules β making testable predictions verified against human reaction time data.
How do symbols get their meaning? In a dictionary, words are defined by other words β circular. A child grounds symbols in perceptual experience: "red" is grounded in actually seeing red things. Harnad argued that purely symbolic AI can never escape this circularity β symbols need to be grounded in the world, not just in other symbols.
How Neural Nets Partially Address This
Multimodal models (CLIP, GPT-4V, Gemini) ground language in images β "red" is statistically associated with red pixel patterns. This is partial grounding. But the model never sees red in the physical sense β it sees statistical co-occurrence in training data. Whether this constitutes genuine grounding is disputed.
Implications for LLMs
Text-only LLMs have no sensory grounding. Their concept of "pain" is the statistical distribution of the word "pain" in training data. This may explain why LLMs discuss concepts fluently but fail tasks requiring genuine understanding of physical causality, embodied experience, or common-sense spatial reasoning.
Russell & Norvig's Artificial Intelligence: A Modern Approach (AIMA) defines AI as the study of agents that perceive their environment and act to maximise their performance measure. This framework is the conceptual backbone of Domain 8 (Agentic AI) β understanding it now is essential.
Rationality β omniscience (knowing all outcomes). Rationality β perfection (always choosing optimally). Rationality = expected utility maximisation given available information.
| PEAS | Definition | Self-Driving Car | LLM Agent |
|---|---|---|---|
| Performance Measure | What "doing well" means β the objective | Safety, speed, comfort, law compliance | Task completion, accuracy, user satisfaction |
| Environment | Everything the agent interacts with | Roads, vehicles, pedestrians, weather | Web pages, APIs, files, databases, other agents |
| Actuators | Mechanisms for taking action | Steering, brakes, accelerator | Code execution, web search, API calls, text output |
| Sensors | Mechanisms for perceiving the environment | Cameras, LiDAR, GPS, radar | Context window, tool outputs, memory retrieval |
| Property | Type A | Type B | Example (A) | Example (B) |
|---|---|---|---|---|
| Observability | Fully observable | Partially observable | Chess (full board visible) | Poker (opponent's cards hidden) |
| Determinism | Deterministic | Stochastic | Chess (outcomes fully determined) | Autonomous driving (weather, pedestrians) |
| Episodicity | Episodic | Sequential | Image classification (each independent) | Chess (moves affect future states) |
| Dynamics | Static | Dynamic | Crossword puzzle | Stock trading, real-time robotics |
| Continuity | Discrete | Continuous | Chess (finite legal moves) | Robot arm control (infinite positions) |
| Agents | Single-agent | Multi-agent | Sudoku solver | Multiplayer games, multi-agent AI systems |
Simple Reflex Agent
IF-THEN rules that map current percepts directly to actions. No memory. No history. Example: thermostat (IF temp < setpoint THEN heat on). Fast and auditable but completely brittle β fails whenever current percept doesn't capture all relevant state.
Model-Based Reflex Agent
Maintains an internal state that tracks the world β knows what it can't currently perceive. Can handle partial observability. Example: a robot vacuum that tracks which areas have been cleaned even when not currently there.
Goal-Based Agent
Has an explicit goal and searches for action sequences to achieve it. Uses search algorithms (A*, BFS) to plan. More flexible than reflex agents β multiple paths to the same goal. Requires a model of how actions change the world.
Utility-Based Agent
Uses a utility function β a graded preference ordering over states, not just goal/no-goal. Chooses actions maximising expected utility. Handles stochastic outcomes naturally. Foundation of decision theory. Modern RL agents and LLM agents with reward models are utility-based.
Learning Agent
Any agent type augmented with a learning component that modifies behaviour based on experience. Composed of: learning element (improves performance), performance element (selects actions), critic (evaluates against a standard), problem generator (suggests exploratory actions). All modern AI systems are learning agents.
| Element | Definition | Example: 8-Puzzle |
|---|---|---|
| Initial state | Starting configuration | Random tile arrangement |
| Actions | Set of possible moves from each state | Slide tile left, right, up, or down |
| Transition model | Result of taking each action in each state | New tile arrangement after sliding |
| Goal test | Determines if current state is goal | Tiles in order 1-2-3-4-5-6-7-8-blank |
| Path cost | Numeric cost of a path | Number of moves taken |
AI is not one field β it is a collection of competing intellectual traditions with different assumptions about what intelligence is and how to build it. Understanding the camps explains why researchers from different traditions argue past each other, and why hybrid approaches are gaining traction.
| Dimension | Symbolicism | Connectionism | Current Status |
|---|---|---|---|
| Core claim | Intelligence = symbol manipulation under logical rules | Intelligence emerges from densely connected simple units | Connectionism dominant (2012βpresent) |
| Key figures | McCarthy, Minsky, Newell, Simon | Rosenblatt, Rumelhart, Hinton, LeCun, Bengio | Hinton, LeCun, Bengio won 2018 Turing Award |
| Strengths | Interpretable, logically consistent, handles explicit structured knowledge | Learns from data, generalises to new inputs, handles noise | Both needed; neither sufficient alone |
| Weaknesses | Brittle, doesn't scale, knowledge acquisition bottleneck | Black box, data-hungry, fails on distribution shift | Interpretability and robustness remain unsolved |
| Modern form | Knowledge graphs, formal verification, constraint solvers | Large language models, diffusion models, transformers | Hybrid (neuro-symbolic) is active research frontier |
Systems combining neural pattern recognition with symbolic reasoning. AlphaGeometry (2024): uses a neural model to generate proof steps and a symbolic geometry solver to verify them β solves IMO-level geometry problems. Program synthesis: neural nets generate code candidates; symbolic testing verifies correctness. This is the current research direction for systematic generalisation beyond interpolation.
Bayesian AI
Models uncertainty explicitly using probability distributions. Bayes' theorem updates beliefs when new evidence arrives: P(H|E) β P(E|H) Γ P(H). Principled framework for reasoning under uncertainty β but computing exact posteriors is often intractable, requiring approximations (MCMC, variational inference).
Applications: spam filtering, medical diagnosis, sensor fusion in robotics, weather forecasting
Probabilistic Graphical Models
Bayesian networks: directed acyclic graphs where nodes are random variables and edges are conditional dependencies. Efficient inference in structured domains. Hidden Markov Models (HMMs): sequences of hidden states with observable outputs. Dominated speech recognition from 1970sβ2010s until deep learning.
Still widely used in robotics, bioinformatics, and probabilistic programming (Stan, Pyro)
Inspired by biological evolution. Genetic algorithms maintain a population of candidate solutions, select for fitness, recombine (crossover) and mutate. Used for optimisation where the landscape is rugged and gradient-based methods fail. Active in neural architecture search (NAS), hyperparameter optimisation, and hardware co-design.
Rodney Brooks (MIT) argued in the 1980s that traditional AI was fundamentally misguided β intelligence doesn't require symbolic internal representations. His subsumption architecture layered simple reactive behaviours to produce surprisingly complex robot behaviour. "Intelligence without representation." Now re-emerging: physical AI labs (Figure, Boston Dynamics, 1X) combine embodied robotics with LLM reasoning.
ACT-R (Anderson): models human cognition with procedural memory, declarative memory, and attention modules β makes testable predictions verified against reaction time data. SOAR (Newell, Laird, Rosenbloom): unified theory of cognition with problem spaces, operators, and impasse-driven learning. Both influenced the architecture of modern AI agents in Domain 8. OpenCog: open-source AGI architecture combining probabilistic logic networks with deep learning.
This chapter connects Domain 1's history and theory to the present, and gives you a map of where each subsequent domain fits. AI in 2026 is defined by foundation models as the new default paradigm β but with hard limitations that motivate everything that follows in this curriculum.
Foundation Models β The New Paradigm
Large models pre-trained on internet-scale data, adaptable to almost any task. The shift: from task-specific models trained from scratch β general-purpose models fine-tuned for tasks. One model (GPT-4o, Claude 3.7, Gemini 1.5) outperforms hundreds of specialised predecessors.
- Open weights: LLaMA 3, Mistral, Qwen, DeepSeek-V3, Phi-3
- Closed API: GPT-4o, Claude 3.7, Gemini 1.5 Pro, Command R+
- Specialised: Codex/StarCoder (code), Whisper (audio), SAM (segmentation)
Open vs. Closed Source
| Closed: Better safety filtering, proprietary advantage, pay-per-use, rate limits |
| Open weights: Full control, privacy, fine-tuning, zero inference cost, safety risk if misused |
Compute scaling trends: training cost for frontier models grows ~4Γ per year. GPT-4: ~$100M. Next-generation frontier: estimated $1B+.
| Capability | What It Means | State of the Art (2026) | Domain Coverage |
|---|---|---|---|
| Perception | Interpreting sensory data (vision, audio, text) | Superhuman on ImageNet; near-human speech; strong OCR | Domain 6 (CV), Domain 5 (NLP) |
| Reasoning | Drawing valid inferences from information | Good at formal logic; poor at common-sense causality | Domain 5 (LLMs), Domain 1 (logic) |
| Planning | Generating action sequences toward goals | Improving with o1/o3 reasoning models; still brittle at scale | Domain 8 (Agents), Domain 7 (RL) |
| Generation | Creating novel text, images, audio, code, video | Human-competitive in text/image; superhuman in code assist | Domain 5 (LLMs), Domain 6 (CV) |
| Action | Taking actions in physical or digital environments | Early stage; computer-use agents emerging; physical robots | Domain 8 (Agents), Domain 7 (RL) |
| Benchmark | Domain | Current Status | Key Concern |
|---|---|---|---|
| ImageNet (ILSVRC) | Computer Vision | Models exceed human ~97%+ | Real-world robustness far lower; adversarial examples break all models |
| GLUE / SuperGLUE | NLP | Saturated β models exceed human baseline | Saturated within 2 years of creation; replaced by harder benchmarks |
| MMLU | Knowledge (57 subjects) | GPT-4 ~87%; Claude 90%+ | Multiple-choice is gameable; doesn't test applied or procedural knowledge |
| BIG-Bench Hard | Reasoning | Frontier models pass most tasks | Being saturated; successor benchmarks in development |
| HumanEval / SWE-bench | Code | ~90% HumanEval; ~50% SWE-bench | SWE-bench is more realistic but still a limited test suite of GitHub issues |
| ARC-AGI | Abstract Reasoning | ~75β80% best models (2025) | Designed to resist pattern matching; remains the hardest general reasoning eval |
Common-Sense Reasoning
Humans know ice cream melts in sun, dropped objects fall, elephants don't fit in cars β without being told. LLMs absorbed much of this from text but fail unpredictably on novel physical scenarios. The Winogrande benchmark reveals systematic errors humans find trivial. Commonsense is the gap between pattern matching and genuine world understanding.
Causal Inference
Current AI is fundamentally correlational. Correlation β causation. "Countries with more hospitals have more disease" β a correlational model would conclude hospitals cause disease. Judea Pearl's causal hierarchy (association β intervention β counterfactual) identifies exactly what's missing. No current neural system reliably reasons causally from observational data.
Long-Horizon Planning
Current agents reliably handle ~10β50 step tasks. Real-world projects span hundreds of interdependent steps over days or weeks. Errors compound: one wrong action early invalidates downstream planning. Agents also suffer context window limitations (memory) and lack persistent state across sessions.
Sample Efficiency & Robustness
Humans learn from a handful of examples; LLMs require billions. Sample efficiency β learning more from less data β is largely unsolved. Robustness β consistent performance under distribution shift β is equally unsolved. Both are critical for safety-critical deployment (medicine, autonomous vehicles, infrastructure).
- AI = systems that perceive, reason, learn, and act β not a single technology; a collection of competing approaches
- All current AI is ANI (Narrow). AGI timelines are genuinely debated; ASI is theoretical. Precision about which level you mean matters enormously.
- AI history is a cycle of overpromise β collapse β breakthrough. The current wave is real β but understanding the winters prevents misreading today's hype.
- AI winters were caused by combinatorial explosion + knowledge acquisition bottleneck + overpromising timelines β not bad science
- Symbolic AI failed to scale but wasn't wrong β it lives on in knowledge graphs, formal methods, and neuro-symbolic hybrids
- The Turing Test, Chinese Room, and symbol grounding problem are actively debated in the context of LLMs β they motivate alignment, interpretability, and AI rights discourse
- The PEAS framework (Performance, Environment, Actuators, Sensors) formally describes any agent β including modern LLM-based agents in Domain 8
- Connectionism dominates today; Bayesian, evolutionary, and embodied approaches remain active and complementary
- Benchmarks are routinely gamed β benchmark saturation does not mean a capability is solved. Always interrogate what the benchmark actually measures.
- Hard open problems: common-sense reasoning, causal inference, long-horizon planning, sample efficiency, robustness