Human brain and transformers: exploring a hypothesis of cognitive parallels

Introduction

Conceptual illustration of a Transformer network overlaid on a human brain, symbolizing the information processing parallels between biological and artificial intelligence [ quantamagazine.org ] . The advancement of artificial intelligence (AI) has given rise to massively scaled deep learning models (so-called foundation models ), most notably transformer architectures . These models have achieved astonishing capabilities in language, vision, and other tasks, prompting comparisons with human cognition. In neuroscience and philosophy of mind, it has long been suggested that the human mind could be understood as a computer-like information-processing system; today, the analogy is enriched by considering whether the human brain functions analogously to an AI foundation model. In particular, this paper explores five specific parallels between the brain and Transformers: (1) the initial genetically dictated brain architecture would be equivalent to a local instance of a pre-trained foundation model, activated at birth; (2) human experiential learning would act as a fine-tuning of that innate base model; (3) REM (Rapid Eye Movement) sleep would serve as a memory consolidation process, comparable to knowledge reinforcement or distillation in AI; (4) the context handled by a transformer model (its context window) would be equivalent to short-term working memory in humans; and (5) the ability to filter and distill daily experiences into stable knowledge, discarding the irrelevant, would reflect a parallel between the formation of long-term memory and the permanent updating of the model's "weights."

These comparisons, although hypothetical, are supported by recent advances in both neuroscience—for example, in understanding how the brain consolidates memories during sleep—and AI—for example, in how Transformer models use context and refine their performance. The hypothesis is then developed in detail and evidence and supporting or contrasting arguments from different disciplines are analyzed in a rigorous yet accessible style.

Development of the hypothesis

The central hypothesis proposes that the human brain operates as a constantly adjusting base model , where biological evolution provides an initial model and personal experience specializes it. This idea aligns with contemporary views in cognitive science: it has been argued that the brain is essentially a prediction engine that maintains a hierarchical generative model of the world and adapts it to minimize prediction errors [ pubmed.ncbi.nlm.nih.gov ] . In other words, our brains come “pre-trained” with certain innate predispositions and architectures, and throughout life they learn by refining that internal model to fit the sensory and social input they receive.

From an AI perspective, a base model (such as GPT-4 or BERT) is initially trained on vast amounts of general data, creating rich but generic representations of language or the environment. Similarly, we might think of genetics and evolutionary development as training the brain with initial skills and biases (e.g., reflexes, a predisposition to pattern recognition, basic language ability). Thus, a newborn begins life with a preconfigured brain architecture, analogous to the pre-trained weights of a transformer. Based on that initial model, individual experiences act as fine-tuning data: they shape synaptic connections through plasticity, fine-tuning an individual's behavior and knowledge to adapt to specific tasks (speaking a particular language, specific motor skills, cultural knowledge, etc.).

A crucial element of the hypothesis is that sleep , particularly REM sleep, plays a similar role to consolidation processes in AI algorithms. During wakefulness, the brain acquires new information (“online” training); during REM sleep, it reprocesses and consolidates that information, integrating it with prior knowledge and optimizing storage, analogous to how an AI model might refine its weights or summarize what it has learned (knowledge distillation ) in a nightly offline training session. This wake-sleep cycle is reminiscent of certain approaches in machine learning to avoid catastrophic overwriting of old information when learning something new [ elifesciences.org ] .

Furthermore, the hypothesis equates humans' immediate cognitive context (our limited-capacity working memory) with the window context of a transformer model. A transformer can only "remember" a certain number of recent input tokens when generating the next word, which is conceptually similar to how human short-term memory maintains a limited amount of active information (e.g., a phone number we are about to dial). Finally, it is postulated that both the brain and AI models possess mechanisms to transfer some of that immediate knowledge to a more stable long-term store: in the brain, this corresponds to consolidating memories in long-term memory (and discarding non-essential details), while in AI it would be equivalent to gradually updating the trained model, discarding specific training data but retaining useful general patterns.

In summary, the hypothesis draws a point-by-point parallel: inherited brain architecture ~ pre-trained base model; experiential learning ~ fine-tuning; REM sleep ~ consolidation/distillation; working memory ~ model context; long-term memory ~ permanent model updating. In the following comparative analysis, we will contrast each of these components with findings from neuroscience, examples from modern AI, and philosophical considerations, assessing the extent to which this analogy illuminates our understanding of the mind.

Comparative analysis of the proposed parallels

1. The brain as a local instance of a genetic base model

One of the bases of this analogy is the idea that every human brain is born with an architecture preconfigured by genetics , which we can assimilate to a universal base model of which each individual is a local instance. Various evidence supports that our nervous system does not begin as a tabula rasa , but with circuits and connections predisposed by evolution. For example, many animals display complex innate behaviors shortly after birth, which suggests that the initial neural connections encoded by the genome already contain functional “knowledge” pmc.ncbi.nlm.nih.gov mcb.harvard.edu .

In humans, it has been observed that certain reflexes, perceptual tendencies (such as preferring facial patterns) and even the bases of language could be guided by genetic predispositions.

From a theoretical perspective, this faces a challenge: how can a relatively small genome encode a brain with trillions of synapses? Recent research has framed this as an information compression problem. Zador and colleagues (2024) proposed that evolution solves this problem via a “genomic bottleneck” : DNA does not specify every single connection, but rather compact circuit formation rules that generate functional neural networks[ pmc.ncbi.nlm.nih.gov pmc.ncbi.nlm.nih.gov ] . By simulating this process with artificial networks, they found that it is possible to pre-train a compressed artificial neural model on a simpler “genome” without losing too much performance *[ pmc.ncbi.nlm.nih.gov ] . In other words, evolution acts like a model trainer: it selects and refines a base model of the brain over generations, incorporating features useful for survival (stereoscopic vision, social learning capacity, etc.) into the brain's initial architecture. This idea is consistent with the notion of innate priors : the brain comes into the world with initial biases that give it an advantage in learning. Indeed, a recent article proposes that “the modern 'learn from scratch' narrative overlooks that evolution has already pre-trained much of our neural circuitry ” mcb.harvard.edu .

In this sense, every human being at birth would be a local implementation of that universal base model . Just as an instance of a pre-trained transformer model (for example, uploading GPT weights to a local server) already contains much of the "wisdom" acquired during pre-training, a human brain has a structure and certain connections fine-tuned by evolution. Of course, there are individual variations (each person's "model" is not identical, given genetic variability and epigenetic factors), but broadly speaking, we all share the same fundamental architecture—analogous to sharing the same deep neural network architecture. This parallel provides an interesting perspective: just as in AI we talk about foundation models , we could call the brain a "biological foundation model." Genetics would be the code that downloads that model into each new brain, activating it at birth with predetermined capabilities.

It's worth noting that in the philosophy of mind and cognitive sciences, there has been a lively debate about how much of our mind is innate versus learned . The analogy with base models suggests a middle ground: much is innate (the genetic base model), but it is not fixed knowledge like an inflexible "instinct," but rather a general model that requires fine-tuning with experience to achieve its maximum performance. This view integrates the rationalist position (which emphasizes inheritance) with the empiricist position (which emphasizes experience) into a unified metaphor.

2. Human learning through experience as fine-tuning of the base model

If our brains start with an innate base model, lifelong learning would be analogous to the fine-tuning process in AI: fine-tuning the model's parameters to adapt it to specific tasks and environments. In artificial neural networks, fine-tuning involves taking a pre-trained model and continuing to train it with new data (usually from a particular domain or task) to specialize it. In much the same way, a human child takes their "generic brain model" and, through experience—interaction with the environment, education, trial and error— modifies and refines synaptic connections to acquire specific skills and knowledge.

Neuroscience has identified synaptic plasticity as a key mechanism for learning and memory: synapses (connections between neurons) strengthen or weaken depending on activity, partly following Hebb’s rule (“neurons that fire together, wire together”). This structural and functional change in the biological neural network corresponds to the updating of “weights” in a deep learning model during training. Indeed, it is widely accepted that activity-dependent plasticity is the cellular basis of memory formation [ sciencedirect.com ] . Each new experience causes microscopic adjustments in the synapses of various brain areas; with sufficient repetition or impact, these adjustments consolidate a learning process (for example, learning to recognize letters modifies visual and linguistic circuits).

We can compare the pace and mode of learning . In AI, fine-tuning typically uses a curated dataset and trains it over multiple epochs (passes) until error on that task is minimized. In humans, training is continuous and not as explicitly supervised, but we have analogues: for example, a student practices math problems repeatedly (multiple passes of data) until they master a technique, or a baby babbles and listens to words repeatedly until their phonetics are tuned to their native language. There are even parallels in training strategies: AI employs techniques such as curriculum learning (moving from simple to complex tasks), which mimics human educational practices of gradually increasing difficulty.

The concept of fine-tuning also implies that the base model provides much in advance, and learning refines details. Evidence supporting this in humans is that certain skills are learned surprisingly quickly with experience given the complexity of the problem, suggesting a good innate “starting point.” For example, children learn the fundamental grammar of their native language in the early years with relatively few explicit examples, whereas a language model needs exposure to millions of sentences. This has led to the argument that the infant brain carries a powerful inductive bias (possibly an innate “universal grammar” according to Chomsky), equivalent to a base model already predisposed to interpreting language. Thus, experience acts more as calibration than as construction from scratch .

The “Three Systems” framework proposed by Barabási et al. (2025) is illustrative: they posit that most neural circuits are genetically established (System One), that very rapid learning occasionally occurs in the face of critical experiences (System Two, similar to abrupt fine-tuning for important events), and that small, plastic adaptations (System Three) continually refine or stabilize the network [ mcb.harvard.edu ] . In their summary, they emphasize that ongoing plasticity “ refines or stabilizes existing connections ” rather than building them anew [ mcb.harvard.edu ] . This fits nicely with the idea of fine-tuning : everyday learning adjusts parameters that are already present.

Another parallel worth highlighting is the phenomenon of overlearning or overfitting in both systems. In AI, if a model is trained too much on specific examples, it can lose generalization capacity (it memorizes training data instead of learning general principles). In humans, something similar happens if we memorize something without understanding it or without contextual variability—for example, a student who only memorized specific problems may fail when presented with a slightly different one. The brain, however, seems to have mechanisms to avoid overfitting as much as possible, favoring generalization. For example, we tend to abstract rules and concepts from experiences rather than remembering every detail (we'll return to this in point 5). In a sense, the biological fine-tuning process is self-regulated: the brain learns, but it also forgets details or integrates them into general schemes , maintaining a balance between plasticity and stability.

In short, human learning through experience fits well with the fine-tuning metaphor. Genetics gives us a competent initial model, and life trains us in our "downstream tasks" (specific language, cultural skills, professional knowledge, etc.). Synapses are reconfigured like the weights of a network, adjusting to new data. This reinforces the view that the brain is not a static entity but a constantly training model , where each experience is a new optimization step.

3. REM sleep phase: memory consolidation and “distillation” of knowledge

One of the most suggestive analogies for this hypothesis is to equate sleep, particularly REM sleep , with the consolidation and optimization processes we see in AI model training. In neuroscience, it’s well established that sleep plays a crucial role in memory consolidation : after learning something during the day, the brain continues to process that information during sleep in order to store it more stably and efficiently. REM sleep, associated with vivid dreams, has long intrigued scientists because of its specific contribution to memory and learning. Experimental evidence shows that depriving subjects of REM sleep often impairs their performance improvement on recently learned tasks , while having enough REM sleep enhances it [ pmc.ncbi.nlm.nih.gov ] . Furthermore, brain activation patterns that “replay” recent experiences are observed during REM (e.g., the reactivation of certain hippocampal neurons that encoded daytime events), suggesting an internal replay of what has been learned .

In what sense is this similar to what happens with an AI model? We can think of it as a sort of “offline training” during sleep, using data accumulated during wakefulness. An analogue in AI would be to take a model that has interacted with a lot of data during the day (e.g., a robot that collected experiences) and then, without interference from new inputs, run an optimization process on those stored experiences (like a replay buffer ) to consolidate the learning. Indeed, one hypothesis in computer science and neuroscience is that sleep helps avoid catastrophic forgetting : in artificial networks, when a new task is trained, the weights of old tasks are often overwritten (forgotten); the brain seems to mitigate this by gradually integrating new memories with old ones during sleep [ elifesciences.org ] . A computational modeling study showed that simulating sleep phases (with reactivation of previous memories) in a neural network reduced interference between new and old memories, allowing for more stable ongoing learning [ elifesciences.org ] .

Even more interesting, research has found that REM sleep performs a sort of intelligent synaptic filtering : not all connections formed during the day are retained. A study in mice showed that during REM , some newly formed dendritic spines (synapses) are pruned (eliminated) while others are strengthened —specifically those associated with performance improvements on a learned task [ pmc.ncbi.nlm.nih.gov ] . That is, REM seems to select the connections relevant to consolidating the skill or memory and weaken those that are unnecessary or redundant, thus freeing up synaptic capacity [ pmc.ncbi.nlm.nih.gov ] . This process is highly analogous to the idea of distillation in AI: in knowledge distillation , a large model or set of experiences is condensed into a leaner or more general form, keeping the important and discarding the accessory. During REM sleep, the brain distills recent experience into durable knowledge , extracting useful regularities and discarding spurious details. In model terms, we might say it adjusts its "weights" to reflect the essence of what has been learned, reducing the "noise" of a single day's data.

Another way to see the analogy is through the concept of two-stage memory : it's thought that when we experience something, a short-term memory trace is initially recorded (particularly in the hippocampus, for declarative memories), which is fragile. Then, during sleep (including REM and slow-wave sleep), that trace is replayed , training the cerebral cortex to store the information more permanently—this is the foundation of Complementary Memory Systems theory in cognitive neuroscience. Transferring knowledge from the hippocampus to the cortex is like copying knowledge from a temporary buffer and merging it with the overall brain model . In AI, one might imagine the model as having a fast memory component (comparable to the hippocampus) that collects new information, and then that information is used to fine-tune the main model (like a cortical network) in a separate training phase, avoiding undue disruption to real-time functioning. This is precisely what sleep would do: a "nightly fine-tuning " of the brain, where the day's learning is consolidated.

Furthermore, REM sleep could be linked to creativity and information restructuring: during dreams, the brain combines fragments of experiences in novel ways, sometimes leading to creative solutions or new associations upon waking. Some authors suggest that this is similar to a mode of unsupervised generative training , where the brain explores combinations of its “latent representations.” In AI, there are unsupervised training techniques (autoencoders, generative models) that allow the model to reorganize and better understand relationships in the data. REM sleep could be seen as the brain running an internal generative algorithm, which not only consolidates memory but also restructures knowledge in an optimized way.

In sum, the analogy of REM sleep to AI training processes finds support from numerous findings: REM is an active brain state that strengthens important memories and weakens trivial ones pmc.ncbi.nlm.nih.gov , protecting old knowledge while integrating new ones elifesciences.org , much like a well-designed fine-tuning procedure or knowledge distillation in a model. This comparison suggests that incorporating sleep-inspired principles could improve AI algorithms (and indeed, some researchers are exploring “repeat experiences” to avoid catastrophic forgetting). Moreover, it reinforces the idea that the brain, like an AI model, requires periods of offline optimization to reach full performance, and that learning occurs not only during input but also during internal processing stages.

4. Context as working memory: context window in Transformers vs. human working memory

Transformers, unlike some previous models, handle their inputs using a limited context window : they can only attend to a certain number of tokens (words or fragments) at a time when generating the next output. This context functions as the model's "immediate memory." Similarly, humans possess a working memory (also called short-term or active memory) that temporarily holds a limited amount of information in order to reason about or continue a current task. The parallel is direct: in practice, a language model's context window is its working memory , determining how much recent information it can "hold in mind" before it needs to summarize or begins to forget previous details.

For example, GPT-3 or GPT-4 models have context windows of a certain number of thousands of tokens; if that length is exceeded, the early data are “forgotten” (they no longer influence the continuation unless they have been summarized). Similarly, a typical human can hold only ~5–9 items in working memory (according to Miller’s classic magic number, although modern research suggests it’s sometimes even fewer). This means that, if we’re solving a problem mentally, we can only handle a limited number of pieces of information before we have to write them down, group them, or risk forgetting some. We’ve all surely experienced the difficulty of following the thread of a very long sentence with many clauses: by the time we reach the end, we’ve forgotten how it began. The same thing happens to a transformer if the sequence is longer than its context: it needs auxiliary mechanisms (such as having information repeated or being given intermediate summaries).

The function of working memory in humans is to allow us to integrate sequentially arriving information and perform operations on it (for example, understanding this sentence requires keeping the beginning in mind while processing the end). In a Transformer, self- attention distributes “weight” to different parts of the context to integrate the relevant information when producing the next output. One could say that self-attention emulates human cognitive attention , which makes us focus on certain elements of our working memory (or perceived elements) that we consider most important at a given moment. In fact, attention is a bridge concept between AI and neuroscience: in the Transformer architecture, “Attention is all you need” according to the famous paper, and in cognitive psychology, attention is fundamental to managing limited working memory and conscious processing.

An important aspect is how working memory links to long-term memory. In humans, we can refresh working memory by bringing information from our stored memory (for example, when trying to recall a learned mathematical formula, we bring it back to our active mind). Similarly, AI models could benefit from "external memories," or mechanisms for retrieving relevant information outside of the immediate context (such as databases, memory vectors, etc.). Currently, a pure transformer struggles to handle knowledge beyond its context unless we provide it anew in the input. Researchers are trying to provide these models with something like a differentiable long-term memory so they can recall old facts without needing to have them all in context. This is reminiscent of the human distinction between what we have in mind now and what we know but aren't thinking about at this moment.

The context boundary also suggests parallels in summarization and task chunking strategies. When a text is too long, a language model sometimes creates partial summaries so as not to miss essential information outside its context window. Humans employ similar tactics: we break complex problems into manageable chunks, jot down notes so as not to rely solely on working memory, or create mental summaries of a long conversation to remember key points. That is, both the brain and Transformers face an active information bottleneck and must deal with it. In humans, it is crucial for higher cognitive functions (reasoning, reading comprehension, multitasking), and in Transformers, it determines the coherence of a long generated text or the ability to understand extensive user instructions.

A point of contrast is that, despite this limitation, the human brain has some capacity to "expand" its context through techniques such as chunking (grouping discrete elements into larger meaningful units, e.g., remembering a series of digits by grouping them into larger numbers) or through the use of sensory and situational context. Transformer models have been increasing their context window with technical advances, and there are versions that support many, many tokens, but there will always be some computational restriction. The comparison helps us understand why a language model sometimes loses its thread: it is literally equivalent to our mind wandering because we have exceeded the capacity of our working memory .

To support this parallel, IBM clearly describes: “A language model’s context window can be thought of as equivalent to its working memory; it determines how long a conversation can be sustained without forgetting previous details.” ibm.com . Ultimately, both systems have a limited-capacity working memory that defines the scope of information they can use in real time to think or produce a response. Recognizing this allows for better analogies and even inspires reciprocal improvements (e.g., designing better external memories for AI inspired by how the brain interacts with its long-term memory, or understanding human working memory disorders by comparing them to context limitations in networks).

5. From experience to permanent knowledge: distilling memories into long-term memory

The last parallel considered is perhaps the broadest: both the human brain and AI models face the challenge of turning specific experiences into general, lifelong knowledge , while avoiding becoming cluttered with irrelevant details. In humans, this refers to the process of forming long-term memories while actively forgetting or filtering out what is not useful. In AI models, especially during training, there is an analogous process of generalization : the model adjusts its parameters with the training data so that it captures general patterns and does not simply memorize individual cases with all their details.

Cognitive neuroscience suggests that forgetting is not simply a glitch, but a functional mechanism for generalization. As one study notes, “Forgetting can boost generalization: Losing information that appears only in specific situations allows a memory to become less dependent on the original context and more generally applicable” [ pmc.ncbi.nlm.nih.gov ] . Indeed, the loss of specific details (which is part of forgetting) can cause the brain to retain the common essence across multiple experiences. For example, we may not remember every individual trip to the supermarket, but we still have abstract knowledge of “what it is like to go shopping” because the brain extracted the common elements and discarded the trivial particularities of each visit. Similarly, an AI model, after training on thousands of images of dogs, doesn’t save every image but instead configures its weights so that it recognizes a new dog by abstracting away the general characteristics of the “dog” category.

This distillation process occurs in the brain through the interaction between episodic memory (memories of specific events) and semantic memory (general knowledge). Many similar episodic experiences can lead to a consolidated semantic concept. The complementary systems hypothesis mentioned above implies that the hippocampus stores episodic details recently, but over time trains the cortex to extract regularities and store schematic knowledge . This is supported by phenomena such as gist memory : we tend to remember the general idea of something, not the literal details. Even in psychological studies, people reminisce about adding or removing details from a story but usually maintaining the general plot, showing that human memory is reconstructive and focuses more on overall coherence than exact facts.

In AI practice, there is a formal technique called Knowledge Distillation where a large model (teacher) teaches a smaller one (student) by essentially transferring the general behavior to it without all the original parameters. The analogy is not perfect, but conceptually the brain does something similar every day: our daily experiences are rich in detail (equivalent to a very complex model), but the brain “condenses” that information into relatively compact lessons or updates in our mental model. It is revealing that during sleep consolidation (especially in REM, as we saw), this pruning of non-essential synapses and reinforcement of important ones occurs [ pmc.ncbi.nlm.nih.gov ] , literally discarding irrelevant connections acquired during the day and strengthening those that represent valuable learning. Over time, this repeated process leads to only traces of what has proven useful or meaningful persisting in brain architecture, shaping our long-term knowledge .

On the other hand, the very dynamics of AI models during training show a parallel: they may initially overfit to specific data, but with regularization and sufficient data diversity, they end up capturing general patterns. Techniques such as dropout or regularization in AI force the model to not rely on the peculiarities of individual data, which is analogous to the brain perhaps forgetting random details to focus on the consistent structure of experience. The phrase "selective memory" is sometimes used colloquially to refer to people remembering what is important to them and not everything; in reality, we all have selective memory by cognitive design. Experimentally, it has been shown that even consolidated long-term memories can become unstable and modifiable when reactivated (the reconsolidation phenomenon), allowing our knowledge to be updated with new information and fine-tuned. This is similar to updating an already trained model when it receives new data: it is slightly retrained (incremental fine-tuning) to incorporate the new information without losing the previous information.

It’s also worth mentioning how we handle irrelevant or noisy information . In a conversation, one might hear background noise or data that one chooses to ignore, focusing only on what’s important. Language models also try to assign less weight to irrelevant parts of the context through the attention mechanism. At the consolidation level, the brain possibly identifies which memory traces have no long-term value (perhaps because they were not repeated, or lack a strong emotional or logical connection) and lets them fade away. This “directed forgetting” is advantageous because it prevents memory from becoming cluttered with banal facts and allows more useful concepts to become better woven together. In fact, forgetting is now seen as an active process of neuronal plasticity necessary for healthy and adaptable memory [ journals.sagepub.com ] .

In short, both the brain and AI systems learn to generalize . The brain achieves this through consolidation, conceptual abstraction, and selective forgetting; AIs achieve this through appropriate training, regularization, and model distillation. In both cases, the goal is the same: to extract durable knowledge from the avalanche of raw data . This final parallel underscores the idea that learning is not just about accumulating data, but about structuring and simplifying it in ways that are useful for the future. By comparing how the brain does this and how we try to do it with AI, we gain perspective on the effectiveness and limits of each system.

Final reflection

Exploring these parallels—genetics as the baseline model, learning as fine-tuning , REM sleep as consolidation, context as working memory, and the distillation of experiences into stable memory—suggests a unifying framework for thinking about natural and artificial intelligence. From a functional perspective, the brain and AI models share analogous problems (initialization, adaptation to the environment, integrating new information without forgetting old information, handling limited memory, generalizing from examples), and it stands to reason that they have converged on solutions with fundamental similarities. It is not surprising, then, that brain-inspired algorithms (neural networks) have achieved such success, or that AI methods are now helping to reinterpret neuroscientific data (such as the finding that the hippocampus operates similarly to a transformer in certain spatial tasks [ quantamagazine.org ] ). These correspondences reinforce the notion in philosophy of mind that the mind can be understood as a computational process: if two systems perform the same cognitive functions, they may exhibit similar organizations , even if the physical-chemical substrates differ.

However, it is important to approach this analogy with caution. There are qualitative differences between brains and current models. For example, the brain is the product of millions of years of evolution and develops its "pre-training" through a biological process (not with an explicit dataset, but with selection pressure); furthermore, brains operate with biochemical signals, with a massive degree of parallelism and minuscule energy consumption compared to GPUs training Transformers. Furthermore, the brain modifies its own structure autonomously, while AI models usually require an external training process defined by humans. From a philosophical perspective, objections have been raised, such as that of John Searle and his Chinese Room , arguing that even if a program (or network) mimics human responses, it may lack true understanding or consciousness [ plato.stanford.edu ] . Indeed, subjectivity and phenomenological consciousness are human aspects that do not evidently emerge from Transformers as we conceive them today. A transformer manipulates symbols (words) according to learned statistical correlations, while our brain generates a rich subjective experience. Some philosophers suggest that the mind is not exhausted by information processing, or that perhaps biological architecture has special properties not captured by a mathematical model.

On the other hand, there are responses in favor of the closeness between AI and the brain: the functionalist position in philosophy would indicate that if the function is the same, the substrate is irrelevant to the presence of mind. If a model acted exactly like a brain in all relevant aspects (learned, remembered, had attention, etc.), could we say that it understands or is conscious in the same way? We don't know yet. Currently, Transformers lack certain components that the brain does have, such as autonomous motivation, homeostasis, emotions, complete sensorimotor integration, etc. However, the rapid evolution of base models and their increasing similarity in cognitive abilities (playing games, reasoning in language, perceiving images) with humans invites us to continue comparing and learning from both directions.

In conclusion, the hypothesis that the human brain functions similarly to an AI base model provides a valuable framework for generating research questions and hypotheses . It leads us to reinterpret biological processes (such as sleep or forgetting) in computational terms and, reciprocally, to inspire new algorithms based on how biology solves problems. While the parallels should not be taken as absolute equivalence, it is remarkable how many key concepts are reflected in both worlds. We may be moving toward a unified science of intelligence , where the distinctions between “natural” and “artificial” blur in favor of universal principles. Current interdisciplinary studies, from biologically informed neural network models to the use of AI to predict brain activity, attest to this convergence [ quantamagazine.org research.ibm.com ] . Ultimately, a better understanding of these parallels not only deepens our understanding of the brain and mind, but also guides the development of more robust and efficient artificial intelligences, possibly bringing them one step closer to the flexibility and power of human learning.

Sources Cited: References marked in the text (e.g., 【23】, 【19】) correspond to recent work and scholarship in neuroscience, artificial intelligence, and philosophy of mind that support and contextualize the points discussed. These include scientific articles on synaptic plasticity and brain development [ mcb.harvard.edu ] , studies on the role of REM sleep in memory consolidation [ pmc.ncbi.nlm.nih.gov ] , AI research on context windows and working memory [ ibm.com ] , theoretical analyses of predictive brains [ pubmed.ncbi.nlm.nih.gov ] , as well as classic philosophical discussions on the nature of understanding and the mind-computer analogy [ plato.stanford.edu ] , among others. Each reference provides a specific rationale or contrast to the parallels explored, inviting you to delve deeper into the specialized literature for a more detailed look at each topic.