This content originally appeared on DEV Community and was authored by Mak SĂČ
âCognition is what the brain does; prediction is only one small part of it.â
A Winding Path to Cognition
I didnât march into AI from the usual computerâscience parade. My first academic life was veterinary medicine, where the dayâtoâday meant anatomy labs by sunrise and barn calls by dusk. That detour plunged me into ethology, evolution, and environmentâdriven natural selection disciplines obsessed me with “how” organisms learn to survive.
But another obsession was tugging at me: code. I left the biology lab behind, returning to my teenage love of programming. By the early 2010s I was building predictiveârisk engines for fintech scoring and later customerâlifetime models for marketing. Powerful AI’s but opaque…
So I started dismantling the black boxes, line by line, determined to understand why they worked.
That tinkering led me to the dream of AGI systems that can acquire a skill once and apply it anywhere. From 2020 onward, I watched the languageâmodel tide surge: attention, embeddings, transformers. Back in 2018 these models could barely string a sensible paragraph together; today they write passable code. Somewhere along the climb, people began to conflate nextâtoken prowess with intelligence.
Yet we humans have direct evidence for only one kind of intelligence: the animal kind grounded in signals, memory, selfâawareness, and ruthless energy efficiency. Weâre also glimpsing other possibilities: the mycelial networks weaving âforest cognitionâ under our feet. Against those benchmarks, current AIs look less like artificial intelligence and more like âstatistical ventriloquism.â
That dissonance is why I wrote this essay.
In the wake of GPTân, itâs fashionable to claim that prediction = intelligence. Token in, token out, job done. Yet when you zoom in on what cognition actually entails from momentâtoâmoment awareness to the uneasy feeling of doubt purely statistical models start to look like an impressive facsimile rather than the real thing.
This longâform essay unpacks cognition into its core ingredients, maps each one onto known neural substrates, and then asks: Can todayâs transformer stacks plausibly implement them? Spoiler: not without new physics (or at least new architectures).
 What Do We Mean by âCognitionâ? An Operating Definition
Etymologically, cognition derives from the Latin cognĆscere âto get to know.â Modern cognitive science slices that knowing into at least six interactive systems:
# | Module | Canonical Function | Observable Phenomena |
---|---|---|---|
â | Perception | Transform raw sensory input into structured representations. | Edge detection, auditory segregation, object permanence. |
⥠| Memory | Encode, consolidate, retrieve past states. | Flashbulb memories, hippocampal replay, recency/primacy curves. |
âą | Attention | Allocate limited processing to salient signals. | Visual spotlight, cocktailâparty effect. |
⣠| Intelligence / Reasoning | Manipulate abstract symbols to plan and infer. | Analogical transfer, chess tactics, Bayesian updates. |
†| SelfâAwareness | Represent the systemâs own state within its worldâmodel. | Mirror test, proprioceptive drift, narrative identity. |
â„ | Metacognition (Doubt) | Evaluate and regulate other cognitive processes. | Confidence judgments, error monitoring (ERN), epistemic emotions like surprise. |
These modules arenât neat Lego bricks; they braid together in the messy biological substrate we call a brain. But the taxonomy is handy when we contrast organic cognition with digital mimicry.
 The Biological Playbook: How Brains Pull It Off
- Distributed, Recurrent Circuits Every cortical column loops information back on itself in microâseconds, allowing stateful computation far richer than feedâforward prediction.
- Embodiment & Sensorimotor Loops Neurons donât just model the world; they act within it, closing the perceptionâaction feedback loop that grounds symbols.
- EnergyâDriven Plasticity Synapses continuously reâwire under local error signals (SpikeâTiming Dependent Plasticity), generating a living memory architecture.
- Homeostatic Regulation Glial cells, hormones, and autonomic bodily states tune cognition momentâtoâmoment.
The upshot: cognition is process, not just function. It unfolds in real time, under noisy constraints, with direct consequences for survival.
 Transformers on Trial: What They Capture and What They Miss
Cognitive Facet | Scorecard | Why Transformers Struggle |
---|---|---|
Perception (text) | ![]() |
Context is limited to training distribution; no new sensors. |
LongâTerm Memory | ![]() |
No native synaptic plasticity; weights are frozen postâtraining. |
Attention | ![]() |
Yet itâs static canât choose what to attend based on novelty or goals. |
Reasoning | ![]() |
Without structured worldâmodels, brittle on edge cases. |
SelfâAwareness | ![]() |
Outputs are ungrounded, hence no subjective perspective. |
Doubt / Metacognition | ![]() |
Lacks hierarchical error monitoring circuitry (ACC, insula). |
Key Insight: Transformers optimize nextâtoken likelihood, not modelâofâtheâworld fidelity. Any seeming selfâreflection is a statistical echo of human text, not an intrinsic evaluative loop.
 Why Statistics Alone Fall Short A Physical Argument
- Thermodynamic Constraints > Biological brains dissipate ~20âŻW yet support continuous online learning. Fineâtuning GPTâ4 equiv. parameters would melt a dataâcenter.
- NonâErgodic Dynamics > Cognition lives in attractor manifolds; trainingâtime batch averages canât reproduce the chaotic itinerancy of real neurons.
- Causal Embedding > Meaning arises from acting in, and being acted upon by, the world. Pure language models float in causal vacuum.
- SecondâOrder Feedback > Metacognition demands a loop that monitors the loop thatâs monitoring the world a stack missing in singleâpass architectures.
In short, intelligence may require recursive, embodied, energyâefficient computation that a rectified linear unit and its thousands of stacked cousins cannot muster.
 CounterâArguments & Rebuttals
âBut GPT shows sparks of reasoning!â
Emergence is not grounding. A parlor trick can imitate reasoning without truthâconditional guarantees.âScale is all you need.â
Scaling laws describe loss curves, not phenomenology. They lack explanatory power for selfâawareness or agency.âWe can bolt on tools.â
Toolâuse patches capabilities (search, calculators) but doesnât birth a selfâmodel.
 Where Do We Go from Here? Possible Research Directions
- Recurrent WorldâModels > Integrate active inference loops Ă Â la DeepMindâs MuZero or MCTS.
- Neuromorphic Hardware > Spiking neural nets on analog chips (Loihi 2, BrainScaleS) promise realâtime plasticity.
- Embodied Agents > Robots or simulated avatars that ground tokens in sensorimotor contingencies.
- Hierarchical MetaâLearners > Architectures that train not just on what to predict but on when to doubt.
This content originally appeared on DEV Community and was authored by Mak SĂČ