Emerging Technologies
What’s next in AI research — from foundation models and world models to neuromorphic computing, quantum AI, embodied intelligence, AGI concepts, and the open frontiers that will define the next decade.
Foundation models — large pre-trained models adapted to downstream tasks — have become the dominant paradigm in AI. A single model trained once on internet-scale data now powers chatbots, code assistants, image generators, and scientific tools simultaneously.
The term “foundation model” was coined by Stanford HAI in 2021 to describe models like GPT-3, BERT, and CLIP that serve as a base for many applications. The key insight: scaling compute, data, and parameters together yields predictable capability gains — the Chinchilla scaling laws (Hoffmann et al., 2022).
| Model | Params | Training Tokens | Modalities | Open? | Key Innovation |
|---|---|---|---|---|---|
| GPT-4o | ~1.8T (MoE) | ~13T | Text, Image, Audio | Closed | Native multimodal I/O |
| Claude 3.5 Sonnet | Undisclosed | Undisclosed | Text, Image | Closed | 200K context, tool use |
| Gemini 1.5 Pro | ~MoE | Undisclosed | Text, Image, Video, Audio | Closed | 1M token context |
| Llama 3.1 405B | 405B | 15T | Text | Open weights | Competitive with closed |
| Mistral Large | ~123B | Undisclosed | Text | Open weights | Efficient MoE |
| DeepSeek-V3 | 671B (MoE) | 14.8T | Text | Open weights | $5.5M training cost |
The frontier has shifted from text-only to natively multimodal models that process text, images, audio, and video in a single architecture. Equally important: post-training techniques (RLHF, DPO, constitutional AI) now matter as much as pre-training scale.
Early: separate encoders (CLIP text + ViT image). Now: unified tokenisation across modalities. GPT-4o processes audio natively — no speech-to-text pipeline. Gemini handles video as first-class input.
RLHF (InstructGPT). DPO: simpler, no reward model. Constitutional AI (Anthropic): self-critique. RLAIF: AI-generated feedback. These techniques add instruction-following, safety, and reasoning on top of raw pre-training.
o1/o3 (OpenAI): spend more compute at inference for harder problems. Chain-of-thought at scale. “Thinking” tokens. DeepSeek-R1: open replication. Shifts the scaling frontier from training to inference.
The debate has shifted from “how big?” to “how smart per dollar?” — DeepSeek-V3 matched GPT-4 at 1/20th the training cost. Efficiency, not raw scale, is the new frontier.
∑ Chapter 12.1 — Key Takeaways
- Foundation models: single pre-trained model → many downstream tasks
- Chinchilla scaling laws: optimal tokens ≈ 20× parameters
- Multimodal is the new default — text, image, audio, video in one model
- Post-training (RLHF, DPO, test-time compute) matters as much as pre-training scale
- Open models (Llama, DeepSeek) are closing the gap with closed frontier models
Current LLMs are pattern matchers on text. World models aim to build AI that actually understands the physical world — predicting what happens next, simulating physics, and reasoning about 3D space and causality.
A world model is an internal representation that allows an AI to predict the consequences of actions without executing them. Humans do this constantly — you can imagine what happens if you push a glass off a table without actually doing it. AI world models aim to learn this predictive capability from data.
Text-to-video diffusion model generating 60-second coherent videos. Emergent 3D consistency and physics simulation. OpenAI described it as a “world simulator” — but it still makes physics errors (objects passing through each other).
Joint Embedding Predictive Architecture. Predicts in latent space, not pixel space — avoiding the curse of pixel-level prediction. LeCun’s proposed path to human-level AI: learn world models through self-supervised prediction.
Interactive world model for 3D environments. Given a single image, generates a playable 3D world. Used for training embodied AI agents. Generates consistent physics, lighting, and object interactions from imagination.
∑ Chapter 12.2 — Key Takeaways
- World models predict consequences of actions without executing them — “imagination” for AI
- Sora: impressive video generation but still fails at physics — not a true world simulator yet
- JEPA (LeCun): predict in latent space, not pixel space — orders of magnitude more efficient
- Genie 2: interactive 3D world generation from a single image — huge for embodied AI training
The human brain runs on 20 watts. GPT-4 training used an estimated 50 million watts over months. Neuromorphic computing aims to build hardware that processes information the way biological brains do — event-driven, massively parallel, and extraordinarily energy-efficient.
Conventional chips (GPUs, TPUs) use the von Neumann architecture: separate memory and compute connected by a data bus. The brain has no such separation — neurons are both memory and compute. Neuromorphic chips replicate this: artificial neurons and synapses co-located on silicon, communicating via spikes (binary events) rather than continuous values.
1 million neurons, 120 million synapses per chip. Event-driven: neurons only fire when needed — 100× more energy efficient than GPU for sparse workloads. On-chip learning rules. Used for robotics, anomaly detection, optimisation.
256 cores, 22 billion transistors. Not a spiking chip but brain-inspired: integrates memory and compute on every core. 25× energy efficiency of GPU for inference. Designed for edge deployment where power matters.
Commercial neuromorphic processors for edge AI. Always-on sensing at microwatts. Target: smart sensors, wearables, IoT. Keyword spotting, gesture recognition, vibration monitoring without cloud connectivity.
| Chip | Approach | Neurons | Power | Best For | Status |
|---|---|---|---|---|---|
| Intel Loihi 2 | Spiking neural network | 1M per chip | ~1W | Robotics, optimisation | Research |
| IBM NorthPole | Compute-in-memory | N/A (digital cores) | ~12W (inference) | Edge inference | Research |
| BrainChip Akida | Spiking + event-driven | 1.2M | <1W | IoT, wearables | Commercial |
| SpiNNaker 2 | Digital ARM-based | 10M+ | ~10W | Brain simulation | Research |
Neuromorphic computing excels at sparse, event-driven tasks (sensor processing, robotics). It is not competitive with GPUs for dense matrix operations that dominate current LLM training/inference. The ecosystems (tools, frameworks, libraries) are years behind CUDA/PyTorch.
∑ Chapter 12.3 — Key Takeaways
- Neuromorphic chips: brain-inspired, event-driven, co-located memory and compute
- Intel Loihi 2: 100× more energy efficient than GPU for sparse workloads
- IBM NorthPole: 25× energy efficiency for inference tasks
- Best suited for edge AI and robotics — not a GPU replacement for LLM training
- Software ecosystem is the bottleneck — no PyTorch equivalent yet
Quantum computing promises exponential speedups for certain problems. The question for AI is whether any of those problems overlap with machine learning. The honest answer: not yet in practice, but the theoretical potential is enormous.
Classical computers use bits (0 or 1). Quantum computers use qubits that can be in superposition (both 0 and 1 simultaneously) and entangled with each other. This enables exploring exponentially many states in parallel — but only for problems with the right mathematical structure.
Quantum SVM: exponential speedup for kernel methods (theoretical). Quantum PCA: faster dimensionality reduction. Variational Quantum Eigensolver (VQE): hybrid quantum-classical for chemistry. Quantum Boltzmann machines.
IBM: 1,121 qubits (Condor, 2023). Google Willow: 105 qubits but below error-correction threshold (2024). IonQ: trapped-ion approach. Quantinuum: highest gate fidelity. All still in NISQ era (Noisy Intermediate-Scale Quantum).
Drug discovery: simulating molecular interactions. Optimisation: portfolio optimisation, logistics. Sampling: generative models. Feature maps: quantum kernels for hard-to-separate data. Not general LLM training.
∑ Chapter 12.4 — Key Takeaways
- Quantum computing: qubits in superposition + entanglement = exponential state space
- NISQ era: noisy, 1000+ qubits but not error-corrected — practical ML advantage not yet demonstrated
- Most promising for: chemistry simulation, optimisation, sampling — not LLM training
- Timeline: 5–10 years for first practical quantum ML applications, if error correction is solved
Intelligence without a body is like learning to swim by reading a book. Embodied AI places agents in physical or simulated environments where they must perceive, act, and learn from the consequences — the way humans and animals do.
The convergence of foundation models + robotics is the defining emerging trend. Previously, every robot task required task-specific training. Now, vision-language-action (VLA) models enable robots to understand natural language commands and generalise across environments.
Vision-Language-Action model: takes camera input + text instruction, outputs robot actions directly. Trained on web data + robot demonstrations. Can follow novel instructions never seen in robot training (“move the banana to the plate”).
Train in simulation (Isaac Sim, MuJoCo), deploy on real robots. Domain randomisation: vary physics, textures, lighting to make policies robust. NVIDIA Isaac: GPU-accelerated simulation of millions of environments in parallel.
Tesla Optimus, Figure 02, Boston Dynamics Atlas (electric). LLM-powered task planning + learned locomotion. Target: general-purpose humanoid that can do any physical task humans do. Timeline: prototype demos now, production 2026–2028.
∑ Chapter 12.5 — Key Takeaways
- VLA models (RT-2): language instruction → robot action — generalise to novel tasks
- Sim-to-real: train millions of episodes in simulation, deploy on real robot
- Humanoid race: Tesla, Figure, Boston Dynamics — LLM planning + learned locomotion
- Key challenge: the physical world is non-differentiable — can’t backprop through reality
Artificial General Intelligence — AI that can perform any intellectual task a human can — is the most debated concept in the field. Some say we are 3 years away. Others say 30. Others say the concept itself is incoherent. Understanding the debate requires understanding what AGI actually means.
There is no agreed definition of AGI. Different labs use different definitions, which makes timeline predictions almost meaningless without specifying what you mean.
| Framework | Definition of AGI | Levels | Where Are We? (2025) |
|---|---|---|---|
| Google DeepMind | Performance + generality matrix | L0 (no AI) → L5 (superhuman, general) | L1–L2 (emerging to competent) |
| OpenAI | AI that outperforms humans at most economically valuable work | L1 Chatbot → L5 Organisation | L2 (Reasoner) — o1/o3 |
| Anthropic | Not a single threshold but a spectrum of capabilities | No formal levels | Narrow superhuman in some tasks |
| Turing Test | Can fool a human judge | Binary pass/fail | Arguably passed (but test is flawed) |
| Chollet (ARC) | Novel problem solving — skill-acquisition efficiency | ARC benchmark | LLMs fail at ARC-AGI |
Proponents: OpenAI, some at Google. Argument: keep scaling transformers + data + compute and emergent capabilities will continue. Evidence: GPT-2 → GPT-4 capabilities were largely unpredicted. Counter: scaling returns may be diminishing.
Proponents: Yann LeCun, Gary Marcus. Argument: transformers lack causal reasoning, planning, persistent memory. Need: world models (JEPA), neuro-symbolic integration, new learning paradigms beyond next-token prediction.
Proponents: Embodied cognition researchers. Argument: intelligence requires grounding in physical world. Need: embodied agents that learn from physical interaction. Inspired by: developmental psychology, enactivism.
Sam Altman: “AGI is closer than people think” (2024)
Dario Amodei: “Powerful AI systems in 2–3 years”
Ray Kurzweil: Human-level AI by 2029
Evidence: rapid capability gains 2022–2025
Yann LeCun: “We are far from human-level AI”
Gary Marcus: “LLMs will never reach AGI”
Rodney Brooks: “Decades away, minimum”
Evidence: LLMs fail at novel reasoning (ARC)
Most AGI timeline predictions come from people with financial incentives to hype or downplay. Lab CEOs predict nearness (attracts investment). Academics predict distance (justifies research funding). Judge the arguments, not the authority.
∑ Chapter 12.6 — Key Takeaways
- No agreed definition of AGI — timelines are meaningless without specifying what you mean
- DeepMind levels: we are at L1–L2 (emerging to competent); L5 is superhuman + general
- Three paths debated: scale alone, new architectures, or embodied hybrid
- Optimists: 2–5 years. Skeptics: decades. Both sides have financial incentives
- ARC benchmark: LLMs fail at novel reasoning — Chollet’s measure of true intelligence
How do you build a model with a trillion parameters but only use 10% of them for any given input? Mixture of Experts (MoE) is the answer — and it has become the dominant architecture for frontier models.
In a dense model (like original GPT-3), every parameter activates for every token. In MoE, each transformer layer has multiple expert sub-networks, and a learned router selects which 1–2 experts to activate per token. Result: total parameters are huge (model capacity), but compute per token is small (efficiency).
| Model | Total Params | Active Params | Experts | Top-K | Key Result |
|---|---|---|---|---|---|
| Switch Transformer | 1.6T | ~100B | 128 | 1 | First trillion-param model (Google, 2022) |
| Mixtral 8x7B | 46.7B | 12.9B | 8 | 2 | Matched Llama 2 70B at 3× less compute |
| GPT-4 | ~1.8T (rumoured) | ~220B | ~16 | 2 | Frontier performance with MoE efficiency |
| DeepSeek-V3 | 671B | 37B | 256 | 8 | GPT-4 level at $5.5M training cost |
| Gemini 1.5 | Undisclosed MoE | Undisclosed | MoE | Top-2 | 1M token context window |
MoE solves the core scaling dilemma: more capacity without proportionally more compute. A 671B MoE model that activates 37B per token costs roughly the same to run as a 37B dense model — but has 18× more knowledge stored in its weights.
∑ Chapter 12.7 — Key Takeaways
- MoE: many experts, sparse activation — only 1–2 experts fire per token
- Mixtral 8x7B: matched Llama 2 70B at 3× less inference compute
- DeepSeek-V3 (671B MoE, 37B active): GPT-4 level at $5.5M training cost
- MoE is now the dominant architecture for frontier models (GPT-4, Gemini, DeepSeek)
- Challenge: load balancing across experts — some experts get overloaded, others idle
Neural networks excel at pattern recognition. Symbolic AI excels at logical reasoning. Neuro-symbolic AI combines both — giving neural networks the ability to reason, and symbolic systems the ability to learn from data.
Pure neural approaches (LLMs) hallucinate, can’t guarantee logical consistency, and struggle with multi-step reasoning. Pure symbolic approaches are brittle, require hand-crafted rules, and don’t generalise. The hybrid combines neural perception + symbolic reasoning.
Solved IMO-level geometry problems (2024). Architecture: neural language model proposes construction steps + symbolic deduction engine verifies proofs. Neither component alone could solve the problems — the combination is key.
Simplest neuro-symbolic pattern: LLM generates code, interpreter executes it. Chain-of-Code (Google): interleave reasoning and computation. Guarantees mathematical correctness where pure LLM reasoning fails.
LLMs generate natural language; knowledge graphs provide structured facts. GraphRAG (Microsoft): LLM uses graph-structured knowledge for grounded reasoning. Reduces hallucination by anchoring claims to verifiable facts.
• Pattern recognition in noisy data
• Generalisation from examples
• Natural language understanding
• Perception (vision, audio)
• Logical consistency & guarantees
• Compositionality & abstraction
• Explainability & auditability
• Exact arithmetic & verification
∑ Chapter 12.8 — Key Takeaways
- Neuro-symbolic: neural perception + symbolic reasoning — best of both worlds
- AlphaGeometry: IMO-level proofs via neural proposals + symbolic verification
- LLM + code execution: simplest and most practical neuro-symbolic pattern today
- Knowledge graphs + LLMs (GraphRAG): reduces hallucination with structured grounding
Not all AI can live in the cloud. Edge AI runs models on devices — phones, cars, sensors, drones — where latency, privacy, and connectivity matter. Federated learning trains models across devices without centralising data.
Edge AI processes data locally on the device rather than sending it to cloud servers. Benefits: lower latency (<10ms vs 100ms+ cloud round-trip), works offline, preserves privacy. Federated learning trains a global model across distributed devices — each device trains locally, sends only model updates (not data) to a central server.
Apple Intelligence: on-device 3B param model. Google Gemini Nano: runs on Pixel phones. Qualcomm: LLM inference on Snapdragon. Quantisation (4-bit, GGUF) makes 7B models fit in 4GB RAM. Privacy: data never leaves device.
Pioneered by Google for keyboard prediction (2017). Each phone trains locally on user data. Server aggregates model updates (FedAvg). Data never transmitted. Used by: Apple (Siri), Google (Gboard), hospitals (medical imaging across sites).
Quantisation: FP16 → INT4 (4× smaller). Pruning: remove redundant weights. Distillation: train small model to mimic large one. TensorRT, ONNX Runtime, Core ML for optimised inference. Key: minimal accuracy loss at 4–8× compression.
| Technique | Compression | Accuracy Loss | Speed Gain | Best For |
|---|---|---|---|---|
| INT8 Quantisation | 2× | <1% | 2× | Server inference |
| INT4 Quantisation | 4× | 1–3% | 3–4× | Mobile / edge |
| Pruning (structured) | 2–5× | 1–5% | 2–3× | Inference-only |
| Distillation | 10–100× | 3–10% | 10–50× | Deploy anywhere |
∑ Chapter 12.9 — Key Takeaways
- Edge AI: <10ms latency, works offline, preserves privacy
- On-device LLMs: 3–7B models on phones via quantisation (INT4, GGUF)
- Federated learning: train across devices without centralising data
- Model compression (quantisation + pruning + distillation): 4–100× smaller with minimal accuracy loss
AI progress is inseparable from hardware progress. Every frontier model is ultimately a bet on silicon — and the supply chain, geopolitics, and economics of AI chips are as important as the algorithms running on them.
| Chip | Company | FP16 TFLOPS | Memory | Training Use | Price |
|---|---|---|---|---|---|
| H100 | NVIDIA | 990 | 80GB HBM3 | GPT-4, Llama 3 | ~$30K |
| B200 | NVIDIA | 2,250 | 192GB HBM3e | Next-gen frontier | ~$35K+ |
| TPU v5p | ~460 | 95GB HBM2e | Gemini | Cloud only | |
| Trainium 2 | AWS | TBD | 96GB HBM | AWS models | Cloud only |
| Gaudi 3 | Intel | ~1,835 | 128GB HBM2e | Limited adoption | ~$15K |
| Groq LPU | Groq | Inference-only | SRAM-based | Fastest inference | Cloud API |
TSMC (Taiwan): fabricates 90%+ of advanced AI chips. US export controls: banned H100 exports to China (2023). China developing domestic alternatives (Huawei Ascend). AI chip supply is now a national security issue.
GPT-4 training: estimated $100M+. Llama 3.1 405B: ~$30M. DeepSeek-V3: $5.5M. Trend: costs per capability are dropping fast via efficiency gains (MoE, better data, longer training). But frontier is still $100M+ as ambitions grow.
∑ Chapter 12.10 — Key Takeaways
- NVIDIA dominates: H100/B200 power virtually all frontier training
- Alternatives emerging: TPU v5p (Google), Trainium (AWS), Groq LPU (inference)
- TSMC fabricates 90%+ of advanced AI chips — geopolitical concentration risk
- Training costs: $100M+ for frontier, but efficiency gains dropping cost per capability
- DeepSeek-V3 at $5.5M shows algorithmic efficiency can substitute for raw compute
When an LLM says “I feel curious about this topic,” is it conscious? Almost certainly not. But the question of whether AI could become conscious is one of the deepest open problems at the intersection of AI, neuroscience, and philosophy.
The Hard Problem of Consciousness (David Chalmers, 1995): why does subjective experience exist at all? We can explain which neurons fire when you see red, but not why it feels like something to see red. This problem applies to AI: even a perfect brain simulation might be a “zombie” — functionally identical but with no inner experience.
Consciousness arises from what a system does, not what it’s made of. If this is true, a sufficiently complex AI could be conscious. Most AI researchers who believe in AI consciousness hold some form of functionalism.
IIT (Giulio Tononi): consciousness = integrated information (Φ). Feedforward networks have Φ ≈ 0. Current LLMs are feedforward at inference — IIT predicts they are not conscious regardless of behaviour.
John Searle (1980): a person following instructions in Chinese without understanding Chinese is not conscious — mere symbol manipulation. Applied to AI: an LLM manipulates tokens without understanding meaning. Counter: the system might understand even if individual components don’t.
If future AI systems could be conscious, we face unprecedented moral obligations. If they cannot, we risk anthropomorphising tools. Either error is dangerous: moral patients deserve protection; moral panics about tools waste resources. The honest answer: we do not know how to test for consciousness, and no current AI system provides evidence of it.
∑ Chapter 12.11 — Key Takeaways
- Hard Problem: why does subjective experience exist? — unsolved for brains and AI
- Functionalism: consciousness from function → AI could be conscious
- IIT: consciousness from integrated information → current LLMs are not conscious
- Chinese Room: symbol manipulation ≠ understanding — debate continues since 1980
- Practical: we have no test for AI consciousness — extraordinary claims require extraordinary evidence
AI moves fast enough that any prediction risks being outdated before the ink dries. But certain open problems are deep enough to outlast any single model generation. These are the problems that will define AI for the next decade.
LLMs can simulate reasoning via chain-of-thought but don’t truly plan. ARC benchmark exposes this. True reasoning requires: causal models, counterfactual thinking, and systematic generalisation beyond training distribution.
Text-only models lack physical grounding. “Heavy” is a word to an LLM, not a felt experience. Solving this may require embodied learning — or it may require better simulation. Open question: is grounding necessary for intelligence?
Current alignment: RLHF, constitutional AI. Works for today’s models. But as models become more capable, how do you align something smarter than you? Scalable oversight, interpretability, and formal verification are active research areas.
AI training and inference are energy-intensive. GPT-4 training: ~50 GWh. Data centres consuming 2–3% of global electricity, growing fast. Need: more efficient architectures, renewable-powered data centres, or fundamentally different compute paradigms.
Technology moves faster than regulation. EU AI Act (2024) is the first comprehensive framework. US has executive orders but no legislation. China has its own rules. No global coordination on frontier AI risks.
McKinsey: 30% of work hours could be automated by 2030. Not job elimination but task transformation. New jobs created but transition is uneven: knowledge workers affected first (inverse of previous automation waves).
• AI agents handling multi-step workflows autonomously
• On-device LLMs on every smartphone
• AI-generated video indistinguishable from real
• Coding AI writing 50%+ of production code
• Multimodal models as default (text-only obsolete)
• AGI by any meaningful definition
• Practical quantum advantage for ML
• Neuromorphic chips competing with GPUs
• Solving alignment for superhuman AI
• Autonomous scientific discovery at scale
∑ Chapter 12.12 — Key Takeaways
- Open problems: reasoning, grounding, alignment at scale, energy, governance
- Confident near-term: AI agents, on-device LLMs, AI-generated video, code AI
- Uncertain: AGI, quantum ML, neuromorphic at scale, superhuman alignment
- Economic disruption: 30% of work hours automatable by 2030 — knowledge workers first
- Governance gap: technology moves faster than regulation — no global coordination
🎓 Domain 12 Complete — Emerging Technologies
- Ch 12.1: Foundation models: scaling laws, multimodal fusion, post-training and test-time compute are the new frontiers.
- Ch 12.2: World models: AI that predicts consequences — Sora, JEPA, Genie 2. Latent-space prediction is key.
- Ch 12.3: Neuromorphic: brain-inspired chips, 100× energy efficiency for sparse tasks. Not a GPU replacement.
- Ch 12.4: Quantum AI: theoretically promising, practically premature. Error correction is the bottleneck.
- Ch 12.5: Embodied AI: VLA models (RT-2) + sim-to-real + humanoid race. Physical intelligence is the next frontier.
- Ch 12.6: AGI: no agreed definition. Three competing paths. Timeline debate is financially motivated.
- Ch 12.7: MoE: dominant frontier architecture. DeepSeek-V3 matched GPT-4 at $5.5M.
- Ch 12.8: Neuro-symbolic: neural perception + symbolic reasoning. AlphaGeometry proved the concept.
- Ch 12.9: Edge AI: on-device LLMs, federated learning, model compression. Privacy + latency advantages.
- Ch 12.10: AI hardware: NVIDIA dominance, TSMC concentration risk, geopolitics of chips.
- Ch 12.11: Consciousness: hard problem unsolved. No evidence current AI is conscious. Multiple competing theories.
- Ch 12.12: Road ahead: reasoning, grounding, alignment, energy, governance — the problems that define the next decade.
Domain 12 is where the known meets the unknown. The technologies here range from production-ready (MoE, edge AI) to speculative (quantum ML, consciousness). The honest takeaway: nobody knows which of these will matter most in 10 years — but understanding all of them gives you the vocabulary to evaluate claims, spot hype, and recognise genuine breakthroughs when they arrive.