AI Foundation · Domain 12

Emerging Technologies

What’s next in AI research — from foundation models and world models to neuromorphic computing, quantum AI, embodied intelligence, AGI concepts, and the open frontiers that will define the next decade.

12.1
Chapter 12.1
Foundation Models — Evolution & Scaling

Foundation models — large pre-trained models adapted to downstream tasks — have become the dominant paradigm in AI. A single model trained once on internet-scale data now powers chatbots, code assistants, image generators, and scientific tools simultaneously.

The term “foundation model” was coined by Stanford HAI in 2021 to describe models like GPT-3, BERT, and CLIP that serve as a base for many applications. The key insight: scaling compute, data, and parameters together yields predictable capability gains — the Chinchilla scaling laws (Hoffmann et al., 2022).

Foundation Model Timeline — from BERT to Frontier Models
BERT2018 · 340M GPT-22019 · 1.5B GPT-32020 · 175B Chinchilla2022 · 70B GPT-42023 · ~1.8T? Llama 32024 · 405B Frontier2025+ · ??? Scaling: 1000× parameter increase in 5 years — but diminishing returns are emerging
Chinchilla Scaling Law L(N, D) ≈ E + A/Nα + B/Dβ L = loss, N = parameters, D = training tokens, E = irreducible entropy. Optimal: tokens ≈ 20 × parameters.
ModelParamsTraining TokensModalitiesOpen?Key Innovation
GPT-4o~1.8T (MoE)~13TText, Image, AudioClosedNative multimodal I/O
Claude 3.5 SonnetUndisclosedUndisclosedText, ImageClosed200K context, tool use
Gemini 1.5 Pro~MoEUndisclosedText, Image, Video, AudioClosed1M token context
Llama 3.1 405B405B15TTextOpen weightsCompetitive with closed
Mistral Large~123BUndisclosedTextOpen weightsEfficient MoE
DeepSeek-V3671B (MoE)14.8TTextOpen weights$5.5M training cost

The frontier has shifted from text-only to natively multimodal models that process text, images, audio, and video in a single architecture. Equally important: post-training techniques (RLHF, DPO, constitutional AI) now matter as much as pre-training scale.

🌐
Multimodal Fusion

Early: separate encoders (CLIP text + ViT image). Now: unified tokenisation across modalities. GPT-4o processes audio natively — no speech-to-text pipeline. Gemini handles video as first-class input.

🧠
Post-Training Revolution

RLHF (InstructGPT). DPO: simpler, no reward model. Constitutional AI (Anthropic): self-critique. RLAIF: AI-generated feedback. These techniques add instruction-following, safety, and reasoning on top of raw pre-training.

💡
Test-Time Compute

o1/o3 (OpenAI): spend more compute at inference for harder problems. Chain-of-thought at scale. “Thinking” tokens. DeepSeek-R1: open replication. Shifts the scaling frontier from training to inference.

Key Trend — 2025

The debate has shifted from “how big?” to “how smart per dollar?” — DeepSeek-V3 matched GPT-4 at 1/20th the training cost. Efficiency, not raw scale, is the new frontier.

∑ Chapter 12.1 — Key Takeaways

  • Foundation models: single pre-trained model → many downstream tasks
  • Chinchilla scaling laws: optimal tokens ≈ 20× parameters
  • Multimodal is the new default — text, image, audio, video in one model
  • Post-training (RLHF, DPO, test-time compute) matters as much as pre-training scale
  • Open models (Llama, DeepSeek) are closing the gap with closed frontier models
12.2
Chapter 12.2
World Models & Simulation

Current LLMs are pattern matchers on text. World models aim to build AI that actually understands the physical world — predicting what happens next, simulating physics, and reasoning about 3D space and causality.

A world model is an internal representation that allows an AI to predict the consequences of actions without executing them. Humans do this constantly — you can imagine what happens if you push a glass off a table without actually doing it. AI world models aim to learn this predictive capability from data.

World Model Architecture — observe, predict, plan
Observation Sensor / Video input Encoder Latent state z_t World Model Predict z_{t+1} from (z_t, a_t) Decoder Predicted future Plan The agent “imagines” outcomes before acting — reducing costly real-world exploration Key: prediction happens in latent space, not pixel space (orders of magnitude faster)
🎬
Sora (OpenAI, 2024)

Text-to-video diffusion model generating 60-second coherent videos. Emergent 3D consistency and physics simulation. OpenAI described it as a “world simulator” — but it still makes physics errors (objects passing through each other).

🧠
JEPA (Yann LeCun)

Joint Embedding Predictive Architecture. Predicts in latent space, not pixel space — avoiding the curse of pixel-level prediction. LeCun’s proposed path to human-level AI: learn world models through self-supervised prediction.

🎮
Genie 2 (DeepMind)

Interactive world model for 3D environments. Given a single image, generates a playable 3D world. Used for training embodied AI agents. Generates consistent physics, lighting, and object interactions from imagination.

∑ Chapter 12.2 — Key Takeaways

  • World models predict consequences of actions without executing them — “imagination” for AI
  • Sora: impressive video generation but still fails at physics — not a true world simulator yet
  • JEPA (LeCun): predict in latent space, not pixel space — orders of magnitude more efficient
  • Genie 2: interactive 3D world generation from a single image — huge for embodied AI training
12.3
Chapter 12.3
Neuromorphic Computing

The human brain runs on 20 watts. GPT-4 training used an estimated 50 million watts over months. Neuromorphic computing aims to build hardware that processes information the way biological brains do — event-driven, massively parallel, and extraordinarily energy-efficient.

Conventional chips (GPUs, TPUs) use the von Neumann architecture: separate memory and compute connected by a data bus. The brain has no such separation — neurons are both memory and compute. Neuromorphic chips replicate this: artificial neurons and synapses co-located on silicon, communicating via spikes (binary events) rather than continuous values.

🧠
Intel Loihi 2

1 million neurons, 120 million synapses per chip. Event-driven: neurons only fire when needed — 100× more energy efficient than GPU for sparse workloads. On-chip learning rules. Used for robotics, anomaly detection, optimisation.

IBM NorthPole (2023)

256 cores, 22 billion transistors. Not a spiking chip but brain-inspired: integrates memory and compute on every core. 25× energy efficiency of GPU for inference. Designed for edge deployment where power matters.

🔌
SynSense / BrainChip Akida

Commercial neuromorphic processors for edge AI. Always-on sensing at microwatts. Target: smart sensors, wearables, IoT. Keyword spotting, gesture recognition, vibration monitoring without cloud connectivity.

ChipApproachNeuronsPowerBest ForStatus
Intel Loihi 2Spiking neural network1M per chip~1WRobotics, optimisationResearch
IBM NorthPoleCompute-in-memoryN/A (digital cores)~12W (inference)Edge inferenceResearch
BrainChip AkidaSpiking + event-driven1.2M<1WIoT, wearablesCommercial
SpiNNaker 2Digital ARM-based10M+~10WBrain simulationResearch
Reality Check

Neuromorphic computing excels at sparse, event-driven tasks (sensor processing, robotics). It is not competitive with GPUs for dense matrix operations that dominate current LLM training/inference. The ecosystems (tools, frameworks, libraries) are years behind CUDA/PyTorch.

∑ Chapter 12.3 — Key Takeaways

  • Neuromorphic chips: brain-inspired, event-driven, co-located memory and compute
  • Intel Loihi 2: 100× more energy efficient than GPU for sparse workloads
  • IBM NorthPole: 25× energy efficiency for inference tasks
  • Best suited for edge AI and robotics — not a GPU replacement for LLM training
  • Software ecosystem is the bottleneck — no PyTorch equivalent yet
12.4
Chapter 12.4
Quantum AI

Quantum computing promises exponential speedups for certain problems. The question for AI is whether any of those problems overlap with machine learning. The honest answer: not yet in practice, but the theoretical potential is enormous.

Classical computers use bits (0 or 1). Quantum computers use qubits that can be in superposition (both 0 and 1 simultaneously) and entangled with each other. This enables exploring exponentially many states in parallel — but only for problems with the right mathematical structure.

⚛️
Quantum ML Algorithms

Quantum SVM: exponential speedup for kernel methods (theoretical). Quantum PCA: faster dimensionality reduction. Variational Quantum Eigensolver (VQE): hybrid quantum-classical for chemistry. Quantum Boltzmann machines.

💻
Current Hardware

IBM: 1,121 qubits (Condor, 2023). Google Willow: 105 qubits but below error-correction threshold (2024). IonQ: trapped-ion approach. Quantinuum: highest gate fidelity. All still in NISQ era (Noisy Intermediate-Scale Quantum).

🔬
Where Quantum Helps AI

Drug discovery: simulating molecular interactions. Optimisation: portfolio optimisation, logistics. Sampling: generative models. Feature maps: quantum kernels for hard-to-separate data. Not general LLM training.

Quantum AI Landscape — what works vs. what’s hype
QUANTUM ADVANTAGE SPECTRUM Proven (narrow) Random circuit sampling, factoring (small) Promising (5–10 years) Chemistry simulation, optimisation Speculative (10+ years) Quantum neural networks, QML at scale Error correction is the key bottleneck: need ~1,000 physical qubits per logical qubit No quantum computer has demonstrated practical ML advantage as of 2025

∑ Chapter 12.4 — Key Takeaways

  • Quantum computing: qubits in superposition + entanglement = exponential state space
  • NISQ era: noisy, 1000+ qubits but not error-corrected — practical ML advantage not yet demonstrated
  • Most promising for: chemistry simulation, optimisation, sampling — not LLM training
  • Timeline: 5–10 years for first practical quantum ML applications, if error correction is solved
12.5
Chapter 12.5
Embodied AI & Physical Intelligence

Intelligence without a body is like learning to swim by reading a book. Embodied AI places agents in physical or simulated environments where they must perceive, act, and learn from the consequences — the way humans and animals do.

The convergence of foundation models + robotics is the defining emerging trend. Previously, every robot task required task-specific training. Now, vision-language-action (VLA) models enable robots to understand natural language commands and generalise across environments.

🤖
RT-2 (Google, 2023)

Vision-Language-Action model: takes camera input + text instruction, outputs robot actions directly. Trained on web data + robot demonstrations. Can follow novel instructions never seen in robot training (“move the banana to the plate”).

🏭
Sim-to-Real Transfer

Train in simulation (Isaac Sim, MuJoCo), deploy on real robots. Domain randomisation: vary physics, textures, lighting to make policies robust. NVIDIA Isaac: GPU-accelerated simulation of millions of environments in parallel.

🧍
Humanoid Race

Tesla Optimus, Figure 02, Boston Dynamics Atlas (electric). LLM-powered task planning + learned locomotion. Target: general-purpose humanoid that can do any physical task humans do. Timeline: prototype demos now, production 2026–2028.

Embodied AI Stack — from language to physical action
Language “Pick up cup” VLM Planning Task decomposition VLA Model Vision → Actions Low-Level Control Motor commands Physical World Real consequences Key challenge: real world is non-differentiable — you can’t backpropagate through a broken cup

∑ Chapter 12.5 — Key Takeaways

  • VLA models (RT-2): language instruction → robot action — generalise to novel tasks
  • Sim-to-real: train millions of episodes in simulation, deploy on real robot
  • Humanoid race: Tesla, Figure, Boston Dynamics — LLM planning + learned locomotion
  • Key challenge: the physical world is non-differentiable — can’t backprop through reality
12.6
Chapter 12.6
AGI — Concepts, Paths & Debate

Artificial General Intelligence — AI that can perform any intellectual task a human can — is the most debated concept in the field. Some say we are 3 years away. Others say 30. Others say the concept itself is incoherent. Understanding the debate requires understanding what AGI actually means.

There is no agreed definition of AGI. Different labs use different definitions, which makes timeline predictions almost meaningless without specifying what you mean.

FrameworkDefinition of AGILevelsWhere Are We? (2025)
Google DeepMindPerformance + generality matrixL0 (no AI) → L5 (superhuman, general)L1–L2 (emerging to competent)
OpenAIAI that outperforms humans at most economically valuable workL1 Chatbot → L5 OrganisationL2 (Reasoner) — o1/o3
AnthropicNot a single threshold but a spectrum of capabilitiesNo formal levelsNarrow superhuman in some tasks
Turing TestCan fool a human judgeBinary pass/failArguably passed (but test is flawed)
Chollet (ARC)Novel problem solving — skill-acquisition efficiencyARC benchmarkLLMs fail at ARC-AGI
📈
Scale Is All You Need

Proponents: OpenAI, some at Google. Argument: keep scaling transformers + data + compute and emergent capabilities will continue. Evidence: GPT-2 → GPT-4 capabilities were largely unpredicted. Counter: scaling returns may be diminishing.

🧩
New Architectures Needed

Proponents: Yann LeCun, Gary Marcus. Argument: transformers lack causal reasoning, planning, persistent memory. Need: world models (JEPA), neuro-symbolic integration, new learning paradigms beyond next-token prediction.

🧬
Hybrid / Embodied Path

Proponents: Embodied cognition researchers. Argument: intelligence requires grounding in physical world. Need: embodied agents that learn from physical interaction. Inspired by: developmental psychology, enactivism.

Optimistic Timeline
Skeptical Timeline

Sam Altman: “AGI is closer than people think” (2024)

Dario Amodei: “Powerful AI systems in 2–3 years”

Ray Kurzweil: Human-level AI by 2029

Evidence: rapid capability gains 2022–2025

Yann LeCun: “We are far from human-level AI”

Gary Marcus: “LLMs will never reach AGI”

Rodney Brooks: “Decades away, minimum”

Evidence: LLMs fail at novel reasoning (ARC)

Critical Thinking Required

Most AGI timeline predictions come from people with financial incentives to hype or downplay. Lab CEOs predict nearness (attracts investment). Academics predict distance (justifies research funding). Judge the arguments, not the authority.

∑ Chapter 12.6 — Key Takeaways

  • No agreed definition of AGI — timelines are meaningless without specifying what you mean
  • DeepMind levels: we are at L1–L2 (emerging to competent); L5 is superhuman + general
  • Three paths debated: scale alone, new architectures, or embodied hybrid
  • Optimists: 2–5 years. Skeptics: decades. Both sides have financial incentives
  • ARC benchmark: LLMs fail at novel reasoning — Chollet’s measure of true intelligence
12.7
Chapter 12.7
Mixture of Experts

How do you build a model with a trillion parameters but only use 10% of them for any given input? Mixture of Experts (MoE) is the answer — and it has become the dominant architecture for frontier models.

In a dense model (like original GPT-3), every parameter activates for every token. In MoE, each transformer layer has multiple expert sub-networks, and a learned router selects which 1–2 experts to activate per token. Result: total parameters are huge (model capacity), but compute per token is small (efficiency).

Mixture of Experts — sparse activation per token
Token Router Expert 1 ✓ Expert 2 Expert 3 Expert 4 ✓ Combine Output Only 2 of N experts activate per token — total params huge, active params small
ModelTotal ParamsActive ParamsExpertsTop-KKey Result
Switch Transformer1.6T~100B1281First trillion-param model (Google, 2022)
Mixtral 8x7B46.7B12.9B82Matched Llama 2 70B at 3× less compute
GPT-4~1.8T (rumoured)~220B~162Frontier performance with MoE efficiency
DeepSeek-V3671B37B2568GPT-4 level at $5.5M training cost
Gemini 1.5Undisclosed MoEUndisclosedMoETop-21M token context window
Why MoE Won

MoE solves the core scaling dilemma: more capacity without proportionally more compute. A 671B MoE model that activates 37B per token costs roughly the same to run as a 37B dense model — but has 18× more knowledge stored in its weights.

∑ Chapter 12.7 — Key Takeaways

  • MoE: many experts, sparse activation — only 1–2 experts fire per token
  • Mixtral 8x7B: matched Llama 2 70B at 3× less inference compute
  • DeepSeek-V3 (671B MoE, 37B active): GPT-4 level at $5.5M training cost
  • MoE is now the dominant architecture for frontier models (GPT-4, Gemini, DeepSeek)
  • Challenge: load balancing across experts — some experts get overloaded, others idle
12.8
Chapter 12.8
Neuro-Symbolic AI

Neural networks excel at pattern recognition. Symbolic AI excels at logical reasoning. Neuro-symbolic AI combines both — giving neural networks the ability to reason, and symbolic systems the ability to learn from data.

Pure neural approaches (LLMs) hallucinate, can’t guarantee logical consistency, and struggle with multi-step reasoning. Pure symbolic approaches are brittle, require hand-crafted rules, and don’t generalise. The hybrid combines neural perception + symbolic reasoning.

🧩
AlphaGeometry (DeepMind)

Solved IMO-level geometry problems (2024). Architecture: neural language model proposes construction steps + symbolic deduction engine verifies proofs. Neither component alone could solve the problems — the combination is key.

🔍
LLM + Code Execution

Simplest neuro-symbolic pattern: LLM generates code, interpreter executes it. Chain-of-Code (Google): interleave reasoning and computation. Guarantees mathematical correctness where pure LLM reasoning fails.

📚
Knowledge Graphs + LLMs

LLMs generate natural language; knowledge graphs provide structured facts. GraphRAG (Microsoft): LLM uses graph-structured knowledge for grounded reasoning. Reduces hallucination by anchoring claims to verifiable facts.

Neural Strengths
Symbolic Strengths

• Pattern recognition in noisy data

• Generalisation from examples

• Natural language understanding

• Perception (vision, audio)

• Logical consistency & guarantees

• Compositionality & abstraction

• Explainability & auditability

• Exact arithmetic & verification

∑ Chapter 12.8 — Key Takeaways

  • Neuro-symbolic: neural perception + symbolic reasoning — best of both worlds
  • AlphaGeometry: IMO-level proofs via neural proposals + symbolic verification
  • LLM + code execution: simplest and most practical neuro-symbolic pattern today
  • Knowledge graphs + LLMs (GraphRAG): reduces hallucination with structured grounding
12.9
Chapter 12.9
Federated Learning & Edge AI

Not all AI can live in the cloud. Edge AI runs models on devices — phones, cars, sensors, drones — where latency, privacy, and connectivity matter. Federated learning trains models across devices without centralising data.

Edge AI processes data locally on the device rather than sending it to cloud servers. Benefits: lower latency (<10ms vs 100ms+ cloud round-trip), works offline, preserves privacy. Federated learning trains a global model across distributed devices — each device trains locally, sends only model updates (not data) to a central server.

📱
On-Device LLMs

Apple Intelligence: on-device 3B param model. Google Gemini Nano: runs on Pixel phones. Qualcomm: LLM inference on Snapdragon. Quantisation (4-bit, GGUF) makes 7B models fit in 4GB RAM. Privacy: data never leaves device.

🔒
Federated Learning

Pioneered by Google for keyboard prediction (2017). Each phone trains locally on user data. Server aggregates model updates (FedAvg). Data never transmitted. Used by: Apple (Siri), Google (Gboard), hospitals (medical imaging across sites).

🎯
Model Compression

Quantisation: FP16 → INT4 (4× smaller). Pruning: remove redundant weights. Distillation: train small model to mimic large one. TensorRT, ONNX Runtime, Core ML for optimised inference. Key: minimal accuracy loss at 4–8× compression.

TechniqueCompressionAccuracy LossSpeed GainBest For
INT8 Quantisation<1%Server inference
INT4 Quantisation1–3%3–4×Mobile / edge
Pruning (structured)2–5×1–5%2–3×Inference-only
Distillation10–100×3–10%10–50×Deploy anywhere

∑ Chapter 12.9 — Key Takeaways

  • Edge AI: <10ms latency, works offline, preserves privacy
  • On-device LLMs: 3–7B models on phones via quantisation (INT4, GGUF)
  • Federated learning: train across devices without centralising data
  • Model compression (quantisation + pruning + distillation): 4–100× smaller with minimal accuracy loss
12.10
Chapter 12.10
AI Hardware & Compute

AI progress is inseparable from hardware progress. Every frontier model is ultimately a bet on silicon — and the supply chain, geopolitics, and economics of AI chips are as important as the algorithms running on them.

ChipCompanyFP16 TFLOPSMemoryTraining UsePrice
H100NVIDIA99080GB HBM3GPT-4, Llama 3~$30K
B200NVIDIA2,250192GB HBM3eNext-gen frontier~$35K+
TPU v5pGoogle~46095GB HBM2eGeminiCloud only
Trainium 2AWSTBD96GB HBMAWS modelsCloud only
Gaudi 3Intel~1,835128GB HBM2eLimited adoption~$15K
Groq LPUGroqInference-onlySRAM-basedFastest inferenceCloud API
🌍
Geopolitics of AI Chips

TSMC (Taiwan): fabricates 90%+ of advanced AI chips. US export controls: banned H100 exports to China (2023). China developing domestic alternatives (Huawei Ascend). AI chip supply is now a national security issue.

💰
Economics of Training

GPT-4 training: estimated $100M+. Llama 3.1 405B: ~$30M. DeepSeek-V3: $5.5M. Trend: costs per capability are dropping fast via efficiency gains (MoE, better data, longer training). But frontier is still $100M+ as ambitions grow.

∑ Chapter 12.10 — Key Takeaways

  • NVIDIA dominates: H100/B200 power virtually all frontier training
  • Alternatives emerging: TPU v5p (Google), Trainium (AWS), Groq LPU (inference)
  • TSMC fabricates 90%+ of advanced AI chips — geopolitical concentration risk
  • Training costs: $100M+ for frontier, but efficiency gains dropping cost per capability
  • DeepSeek-V3 at $5.5M shows algorithmic efficiency can substitute for raw compute
12.11
Chapter 12.11
AI Consciousness & Philosophy

When an LLM says “I feel curious about this topic,” is it conscious? Almost certainly not. But the question of whether AI could become conscious is one of the deepest open problems at the intersection of AI, neuroscience, and philosophy.

The Hard Problem of Consciousness (David Chalmers, 1995): why does subjective experience exist at all? We can explain which neurons fire when you see red, but not why it feels like something to see red. This problem applies to AI: even a perfect brain simulation might be a “zombie” — functionally identical but with no inner experience.

🧐
Functionalism

Consciousness arises from what a system does, not what it’s made of. If this is true, a sufficiently complex AI could be conscious. Most AI researchers who believe in AI consciousness hold some form of functionalism.

💠
Integrated Information Theory

IIT (Giulio Tononi): consciousness = integrated information (Φ). Feedforward networks have Φ ≈ 0. Current LLMs are feedforward at inference — IIT predicts they are not conscious regardless of behaviour.

🤔
Chinese Room Argument

John Searle (1980): a person following instructions in Chinese without understanding Chinese is not conscious — mere symbol manipulation. Applied to AI: an LLM manipulates tokens without understanding meaning. Counter: the system might understand even if individual components don’t.

Why This Matters Practically

If future AI systems could be conscious, we face unprecedented moral obligations. If they cannot, we risk anthropomorphising tools. Either error is dangerous: moral patients deserve protection; moral panics about tools waste resources. The honest answer: we do not know how to test for consciousness, and no current AI system provides evidence of it.

∑ Chapter 12.11 — Key Takeaways

  • Hard Problem: why does subjective experience exist? — unsolved for brains and AI
  • Functionalism: consciousness from function → AI could be conscious
  • IIT: consciousness from integrated information → current LLMs are not conscious
  • Chinese Room: symbol manipulation ≠ understanding — debate continues since 1980
  • Practical: we have no test for AI consciousness — extraordinary claims require extraordinary evidence
12.12
Chapter 12.12
The Road Ahead — Open Problems & Predictions

AI moves fast enough that any prediction risks being outdated before the ink dries. But certain open problems are deep enough to outlast any single model generation. These are the problems that will define AI for the next decade.

🔍
Reasoning & Planning

LLMs can simulate reasoning via chain-of-thought but don’t truly plan. ARC benchmark exposes this. True reasoning requires: causal models, counterfactual thinking, and systematic generalisation beyond training distribution.

🌍
Grounding & Embodiment

Text-only models lack physical grounding. “Heavy” is a word to an LLM, not a felt experience. Solving this may require embodied learning — or it may require better simulation. Open question: is grounding necessary for intelligence?

🔒
Alignment at Scale

Current alignment: RLHF, constitutional AI. Works for today’s models. But as models become more capable, how do you align something smarter than you? Scalable oversight, interpretability, and formal verification are active research areas.

Energy & Sustainability

AI training and inference are energy-intensive. GPT-4 training: ~50 GWh. Data centres consuming 2–3% of global electricity, growing fast. Need: more efficient architectures, renewable-powered data centres, or fundamentally different compute paradigms.

⚖️
Governance Gap

Technology moves faster than regulation. EU AI Act (2024) is the first comprehensive framework. US has executive orders but no legislation. China has its own rules. No global coordination on frontier AI risks.

💼
Economic Disruption

McKinsey: 30% of work hours could be automated by 2030. Not job elimination but task transformation. New jobs created but transition is uneven: knowledge workers affected first (inverse of previous automation waves).

High Confidence (2–3 years)
Uncertain (5–10+ years)

• AI agents handling multi-step workflows autonomously

• On-device LLMs on every smartphone

• AI-generated video indistinguishable from real

• Coding AI writing 50%+ of production code

• Multimodal models as default (text-only obsolete)

• AGI by any meaningful definition

• Practical quantum advantage for ML

• Neuromorphic chips competing with GPUs

• Solving alignment for superhuman AI

• Autonomous scientific discovery at scale

∑ Chapter 12.12 — Key Takeaways

  • Open problems: reasoning, grounding, alignment at scale, energy, governance
  • Confident near-term: AI agents, on-device LLMs, AI-generated video, code AI
  • Uncertain: AGI, quantum ML, neuromorphic at scale, superhuman alignment
  • Economic disruption: 30% of work hours automatable by 2030 — knowledge workers first
  • Governance gap: technology moves faster than regulation — no global coordination

🎓 Domain 12 Complete — Emerging Technologies

  • Ch 12.1: Foundation models: scaling laws, multimodal fusion, post-training and test-time compute are the new frontiers.
  • Ch 12.2: World models: AI that predicts consequences — Sora, JEPA, Genie 2. Latent-space prediction is key.
  • Ch 12.3: Neuromorphic: brain-inspired chips, 100× energy efficiency for sparse tasks. Not a GPU replacement.
  • Ch 12.4: Quantum AI: theoretically promising, practically premature. Error correction is the bottleneck.
  • Ch 12.5: Embodied AI: VLA models (RT-2) + sim-to-real + humanoid race. Physical intelligence is the next frontier.
  • Ch 12.6: AGI: no agreed definition. Three competing paths. Timeline debate is financially motivated.
  • Ch 12.7: MoE: dominant frontier architecture. DeepSeek-V3 matched GPT-4 at $5.5M.
  • Ch 12.8: Neuro-symbolic: neural perception + symbolic reasoning. AlphaGeometry proved the concept.
  • Ch 12.9: Edge AI: on-device LLMs, federated learning, model compression. Privacy + latency advantages.
  • Ch 12.10: AI hardware: NVIDIA dominance, TSMC concentration risk, geopolitics of chips.
  • Ch 12.11: Consciousness: hard problem unsolved. No evidence current AI is conscious. Multiple competing theories.
  • Ch 12.12: Road ahead: reasoning, grounding, alignment, energy, governance — the problems that define the next decade.

Domain 12 is where the known meets the unknown. The technologies here range from production-ready (MoE, edge AI) to speculative (quantum ML, consciousness). The honest takeaway: nobody knows which of these will matter most in 10 years — but understanding all of them gives you the vocabulary to evaluate claims, spot hype, and recognise genuine breakthroughs when they arrive.