đŸ”„ De‑constructing Cognition and Why LLMs Can’t Replicate It



This content originally appeared on DEV Community and was authored by Mak SĂČ

“Cognition is what the brain does; prediction is only one small part of it.”

A Winding Path to Cognition

I didn’t march into AI from the usual computer‑science parade. My first academic life was veterinary medicine, where the day‑to‑day meant anatomy labs by sunrise and barn calls by dusk. That detour plunged me into ethology, evolution, and environment‑driven natural selection disciplines obsessed me with “how” organisms learn to survive.

But another obsession was tugging at me: code. I left the biology lab behind, returning to my teenage love of programming. By the early 2010s I was building predictive‑risk engines for fintech scoring and later customer‑lifetime models for marketing. Powerful AI’s but opaque…
So I started dismantling the black boxes, line by line, determined to understand why they worked.

That tinkering led me to the dream of AGI systems that can acquire a skill once and apply it anywhere. From 2020 onward, I watched the language‑model tide surge: attention, embeddings, transformers. Back in 2018 these models could barely string a sensible paragraph together; today they write passable code. Somewhere along the climb, people began to conflate next‑token prowess with intelligence.

Yet we humans have direct evidence for only one kind of intelligence: the animal kind grounded in signals, memory, self‑awareness, and ruthless energy efficiency. We’re also glimpsing other possibilities: the mycelial networks weaving “forest cognition” under our feet. Against those benchmarks, current AIs look less like artificial intelligence and more like “statistical ventriloquism.”

That dissonance is why I wrote this essay.

In the wake of GPT‑n, it’s fashionable to claim that prediction = intelligence. Token in, token out, job done. Yet when you zoom in on what cognition actually entails from moment‑to‑moment awareness to the uneasy feeling of doubt purely statistical models start to look like an impressive facsimile rather than the real thing.

This long‑form essay unpacks cognition into its core ingredients, maps each one onto known neural substrates, and then asks: Can today’s transformer stacks plausibly implement them? Spoiler: not without new physics (or at least new architectures).

 What Do We Mean by “Cognition”? An Operating Definition

Etymologically, cognition derives from the Latin cognƍscere “to get to know.” Modern cognitive science slices that knowing into at least six interactive systems:

# Module Canonical Function Observable Phenomena
① Perception Transform raw sensory input into structured representations. Edge detection, auditory segregation, object permanence.
② Memory Encode, consolidate, retrieve past states. Flashbulb memories, hippocampal replay, recency/primacy curves.
⑱ Attention Allocate limited processing to salient signals. Visual spotlight, cocktail‑party effect.
④ Intelligence / Reasoning Manipulate abstract symbols to plan and infer. Analogical transfer, chess tactics, Bayesian updates.
â‘€ Self‑Awareness Represent the system’s own state within its world‑model. Mirror test, proprioceptive drift, narrative identity.
â‘„ Metacognition (Doubt) Evaluate and regulate other cognitive processes. Confidence judgments, error monitoring (ERN), epistemic emotions like surprise.

These modules aren’t neat Lego bricks; they braid together in the messy biological substrate we call a brain. But the taxonomy is handy when we contrast organic cognition with digital mimicry.

 The Biological Playbook: How Brains Pull It Off

  1. Distributed, Recurrent Circuits  Every cortical column loops information back on itself in micro‑seconds, allowing stateful computation far richer than feed‑forward prediction.
  2. Embodiment & Sensorimotor Loops  Neurons don’t just model the world; they act within it, closing the perception‑action feedback loop that grounds symbols.
  3. Energy‑Driven Plasticity  Synapses continuously re‑wire under local error signals (Spike‑Timing Dependent Plasticity), generating a living memory architecture.
  4. Homeostatic Regulation  Glial cells, hormones, and autonomic bodily states tune cognition moment‑to‑moment.

The upshot: cognition is process, not just function. It unfolds in real time, under noisy constraints, with direct consequences for survival.

 Transformers on Trial: What They Capture and What They Miss

Cognitive Facet Scorecard Why Transformers Struggle
Perception (text) ✅ Surface pattern recognition excels. Context is limited to training distribution; no new sensors.
Long‑Term Memory ⚠ External tools help (RAG, vector DB) but lack true consolidation. No native synaptic plasticity; weights are frozen post‑training.
Attention ✅ Scaled dot‑product ≈ soft attention. Yet it’s static can’t choose what to attend based on novelty or goals.
Reasoning ⚠ Emergent chain‑of‑thought appears
 sometimes. Without structured world‑models, brittle on edge cases.
Self‑Awareness ❌ No internal pointer to “this sentence was generated by me.” Outputs are ungrounded, hence no subjective perspective.
Doubt / Metacognition ❌ No confidence calibration beyond token probabilities. Lacks hierarchical error monitoring circuitry (ACC, insula).

Key Insight: Transformers optimize next‑token likelihood, not model‑of‑the‑world fidelity. Any seeming self‑reflection is a statistical echo of human text, not an intrinsic evaluative loop.

 Why Statistics Alone Fall Short A Physical Argument

  1. Thermodynamic Constraints > Biological brains dissipate ~20 W yet support continuous online learning. Fine‑tuning GPT‑4 equiv. parameters would melt a data‑center.
  2. Non‑Ergodic Dynamics > Cognition lives in attractor manifolds; training‑time batch averages can’t reproduce the chaotic itinerancy of real neurons.
  3. Causal Embedding > Meaning arises from acting in, and being acted upon by, the world. Pure language models float in causal vacuum.
  4. Second‑Order Feedback > Metacognition demands a loop that monitors the loop that’s monitoring the world a stack missing in single‑pass architectures.

In short, intelligence may require recursive, embodied, energy‑efficient computation that a rectified linear unit and its thousands of stacked cousins cannot muster.

 Counter‑Arguments & Rebuttals

“But GPT shows sparks of reasoning!”

Emergence is not grounding. A parlor trick can imitate reasoning without truth‑conditional guarantees.

“Scale is all you need.”

Scaling laws describe loss curves, not phenomenology. They lack explanatory power for self‑awareness or agency.

“We can bolt on tools.”

Tool‑use patches capabilities (search, calculators) but doesn’t birth a self‑model.

 Where Do We Go from Here? Possible Research Directions

  • Recurrent World‑Models > Integrate active inference loops à la DeepMind’s MuZero or MCTS.
  • Neuromorphic Hardware > Spiking neural nets on analog chips (Loihi 2, BrainScaleS) promise real‑time plasticity.
  • Embodied Agents > Robots or simulated avatars that ground tokens in sensorimotor contingencies.
  • Hierarchical Meta‑Learners > Architectures that train not just on what to predict but on when to doubt.


This content originally appeared on DEV Community and was authored by Mak SĂČ