Not long ago, Anthropic, an AI company, announced its latest Large Language Model (LLM) named Claude. After an employee from Anthropic shared his experimental findings on X, it is rumored by many that it has become self-aware. Before this, I had been considering writing about self-awareness, and so it seems like a good time to reflect on what self-awareness means and whether current or future AI can attain self-awareness.

Before diving into the discussion, I would like to stress that self-awareness is not a binary property, and thus a simple yes-or-no answer (or even one involving degrees) is usually inadequate to capture the underlying complexity. As I will explain later in this article, self-awareness comes in various forms and degrees. It may be better to think of self-awareness as a property manifested in some behaviors of a system, and each system may manifest self-awareness through different behaviors, showing an awareness of different aspects of self. We can then roughly say that the more aspects of self the system is aware of, the richer and higher its self-awareness is.

I will begin my discussion with some examples that I think involve the most primitive form of self-awareness, which I will develop into a general definition of self-awareness. Then, I will bring in various examples of self-awareness, which I hope can make the abstract definition concrete. The first kind of self-awareness I will discuss is bodily self-awareness, the awareness of one’s own body. In this part, I will also discuss the famous mirror test. The second kind I will then deal with is the more intriguing form of self-awareness, which involves introspection, meta-cognition, and mental models of oneself. In the process, I will also make some informed guesses about how to implement these abilities.

After all the preliminary discussion on self-awareness, I will finally discuss whether current AI possesses some form of self-awareness by examining the current paradigm as well as some experiments done on Large Language Models (LLMs).

. . .

The Simplest Form of Self-Awareness

I think the simplest form of self-awareness is being aware of the difference between oneself and others and performing some useful, discriminative behavior based on that awareness. Here, by “being aware of the difference”, I mean the signals coming from oneself and others would be represented differently in the system, and thus different reactions can be generated based on this difference. This criterion does not necessarily require one to build a continuously updated self-model (I will explain what this means later) in a complex cognitive system. Rather, some decisions that are made based on some simplistic knowledge of oneself would suffice. Since this bar is not high, self-awareness in this sense may be much more prevalent than we think. Many animals, or even plants already satisfy this criterion each in their own way.

For example, for a territorial animal that claims their territory using markers (such as urine), it is necessary that they can distinguish their own markers from those of others, and it can be expected that they act differently when someone else’s marker is encountered. If they behave the same way in their and other’s territory, then territories would be meaningless. Besides territorial animals, it is discovered that octopuses' suckers recognize octopus skins (they stick automatically to most objects but not octopus skin), and octopuses seem to treat their own and others' amputated arms differently [1]. Interestingly, some cannibalistic jumping spiders also seem to be able to distinguish between their own and others' draglines [2]. It is not just animals that can be self-aware in this sense, even plants can recognize themselves. Many plants use self-recognition to avoid self-fertilization or to efficiently allocate their root. Moreover, some vines can recognize themselves and their tendrils prefer to grab onto non-self objects [3]. As a final example, almost all living organisms need to have some form of immune system to defend against pathogens that pose threats to one’s survival. These immune systems must recognize what is from the self and what is not. Otherwise, the organism would be killed by its own immune system.

Many of the examples above are often called self-recognition in the literature and are deemed different from “true” self-awareness. Therefore, some people may disagree with my use of “self-awareness” above. However, I do not think there is a clear cut between these two concepts and it makes more sense to me to simply adopt a more general notion of self-awareness, which includes these cases of self-recognition as a weaker form. As long as we are aware that self-awareness, as I am using it, still comes in various degrees and forms, we would not confuse these weaker forms of self-awareness with the higher ones, which I will get into next.

Self-Knowledge

Beyond showing some behavior differentiating self from others, I think higher forms of self-awareness generally correspond to having more self-knowledge. Self-knowledge comes in different forms and richness. From simply associating an attribute with oneself to tracking dynamic states of oneself, developing an implicit understanding of self in motor control, and finally constructing advanced self-models, these are all different forms of self-knowledge. Among these cases, self-models may be the closest to what most people regard as self-awareness. A model is usually a simplified representation of an object that can be used in our reasoning about the modeled object. Likewise, having a self-model would mean having a representation of oneself that can be used for making decisions, reasoning, simulating, and planning one’s own actions.

Using this general definition, perhaps we can say the examples discussed in the previous section also count as some brutally simple self-knowledge. For example, if octopus suckers recognize their own skin by only some chemical matching process, the implicit knowledge about itself used in the decision process of the suckers would be that octopus skin exhibits this chemical feature while other surfaces usually do not. Similarly, for our immune system, the self is associated with a set of molecular features, which our immune system will recognize as safe. When a territorial animal recognizes its own territory, the self-knowledge used in this process would be the chemical features that differentiate the animal from others.

Nevertheless, one is certainly not just a set of chemical features. A rich understanding of self would need to capture multiple aspects of oneself from multi-modal senses and would also involve the low-level features as well as the inferred high-level concepts. Furthermore, as the self changes throughout its existence, the model has to be updated to keep track of the dynamic aspects of oneself.

As I have mentioned in the beginning, I think self-awareness can be roughly separated into two kinds: bodily self-awareness and mental self-awareness. The former involves tracking the current state of its own body (e.g., the limbs, trunk, including the internal organs) and a general understanding of one’s own body. The latter involves tracking and understanding one’s own cognitive process (i.e., metacognition and a theory of one’s own mind). I’ll discuss bodily self-awareness before getting into mental self-awareness.

Bodily Self-Awareness

To track one’s bodily states or maintain an updated understanding of oneself, the cognitive system must have ways to collect information about the body. This is done by having signals carrying information about certain bodily states sent to the cognitive system. Animals like us have diverse types of receptors scattered across our bodies, each carrying information about a different aspect of our body, while machines can also achieve this by having different types of sensors tracking their bodies in detail.

The simplest form of bodily self-awareness is perhaps being able to perceive the “immediate” states of one’s body. For example, one may seek sunlight/shade depending on one’s current body temperature, or one may detect injuries in the body and react by, say, retracting one’s hand. This form of bodily self-awareness should be quite common in the animal kingdom and should be easy to implement in robots, too.

Besides tracking the immediate state of oneself, one may also know some more persistent properties of oneself. An example is the size of our body parts, which is important for inferring the positions of our body parts. With my eyes closed, I can still feel the position of my arm as I move it around, even though my cognitive system may be only receiving proprioceptive signals indicating the stretch of my muscles or my joint angles. To infer the position of my body parts from these sensed quantities, the brain necessarily requires information about the size of the body parts. Likewise, when trying to act, our motor system has to activate the corresponding muscles with an appropriate intensity. This appropriate intensity depends on bodily parameters like weight or the connection point of muscles and bones. Therefore, the motor system generating appropriate motor signals also requires some implicit self-knowledge. Besides knowing how to perform an elementary muscle movement, intelligent beings also need to plan a series of actions to achieve their goal. In order to plan, one often requires a self-model, which may include properties like the size, shape, and body structure of oneself, the possible actions one can take, or the physical limits of oneself. Since most of our action involves the environment, understanding how one’s own actions affect the environment is also important.

From these examples, we can see that, for an animal wishing to navigate and survive in this world, some basic self-awareness, in the sense of understanding one’s own body, is actually crucial. Therefore, we can probably expect most animals to have some degree of self-awareness in this sense.

Learning Self-Models

Although I cannot map out the detailed process of acquiring these pieces of self-knowledge mentioned in the last section, in principle, most of them can be (and have to be) obtained as the brain discovers the statistical regularities existing in one’s multi-modal sensory experiences as well as their relation to one’s actions.

Specifically, one possibility of letting a machine automatically learn a self-representation may lie in the so-called “self-supervised learning” studied in machine learning. In this approach, the model is often trained to predict part of the data from other parts of the data (e.g., predicting one modality from another, or predicting the future from the past). The advantage of this approach is that abstract, latent representations of the data can be learned without manual labeling by humans.

Inspired by this idea, I think representations of oneself can also be naturally learned if one applies this principle to one’s sensory and motor signals. The reason is that one’s sensory experience often includes signals related to oneself in multiple modalities, and these signals of different modalities are usually highly correlated. For instance, as I perform multiple experiments of watching my right hand touch my left hand, my eyes provide a visual representation of this touch event, while the tactile receptors provide another signal indicating this event. These two representations of the same event are highly correlated.
Therefore, in modeling the relation between these signals or building efficient latent representations of these signals, the correlation between them may be discovered, leading to a unified representation of one’s body. Furthermore, since how the sensory experiences change with actions depends on the structure of one’s body, a self-model may also be learned by observing and modeling the relation between them.

Interestingly, there has been research done on simple robots using this idea [4]. In this study, the topology and parameters of the robot’s body can be learned by trying to predict the tilt angle of the body (sensory signals) from joint angles (motor signals). This is certainly a relatively simple example. I think future robots with access to more modalities, more sensors, and more data samples would be able to build much richer self-models.

The process by which our brain builds an understanding of the world and ourselves is probably similar. As a living being, being able to predict the near future or the consequences of an action is important for planning our actions and further ensuring our survival. By constantly predicting the future, verifying them with experiences, and modifying our model in cases of error, we can gradually build a sophisticated model of the world as well as ourselves. In fact, this is roughly what the predictive coding theory in neuroscience is about. The theory states that our brain maintains a generative model, by which we constantly predict future sensory experiences coming from the world. As waves of sensory experiences arrive, our brain calculates the errors of our predictions. In the short term, these errors guide us to infer a better latent variable for explaining our current experiences. In the long term, these error leads to updates in our model. One can see that this idea has close connections to self-supervised learning. I think by applying predictive coding to experiences about oneself, a self-model may naturally emerge from this process.

The Mirror Test

Speaking of self-awareness, it seems unjustifiable to ignore the mirror test, which is the most well-known test for self-awareness in animals that examines whether animals recognize themselves in the mirror.

In the mirror test, animals are first exposed to mirrors, and their behaviors are observed. For an animal without previous experience with mirrors, the first reactions are usually social responses. They treat the mirrored image as an individual of the same species and show aggression in many cases. After a while, if the animals are smart enough, social responses stop and they begin to perform tests on the mirror, gradually understanding how mirrors work. Some animals may even start to use the mirror to investigate themselves, looking at parts of themselves that are typically hidden from themselves (e.g., their face or the inside of their mouth). After the animals have become familiar with mirrors, marks are placed on parts of the animal that are only visible using the mirror. If an animal then realizes that the mark is on itself and proceeds to touch/investigate the mark, it is then said to have passed the mirror test. Among the tested animals, primates, elephants, dolphins, and even cleaner fishes have been reported to pass the test.

How much exactly does the mirror test tell us about self-awareness? We should not assume that all animals passing the test have the same kind or level of self-awareness, nor should we infer that animals failing the test have zero self-awareness. A detailed analysis of the cognitive abilities required for the test would help us better interpret the test results.

First, the animal must already have an understanding of its body so that upon discovering the mark on its body, it knows the position of that body part and can then plan actions to investigate the mark.

Second, to recognize itself in the mirror, the animal must be able to discover the connection between the visual signals coming from the mirror and the proprioceptive, motor signals, or even existing abstract representations of itself. To achieve this, the animal needs to have a cognitive system that integrates these signals and detects strongly correlated components so that the animal can understand that, say, a certain patch of its visual experiences (its reflection in the mirror) and its internal signals both represent itself. In other words, the animal’s brain has to rewire itself so that it starts to infer states of itself utilizing additional information coming from the mirror (e.g., inferring that it has a mark on its head based on the reflection).

Third, to be motivated in investigating the mark, the animal probably has to store a visual representation of its normal self, so that upon seeing the mark the animals recognize that as an abnormal condition. This visual representation can be formed during the self-investigation stage after the animals become familiarized with the mirror.

Since failing to satisfy one of the above criteria can lead to failure of the test, the failure may be just due to its inability to incorporate new visual information about itself (the second and third criterion). In this case, the animal may still have a representation of itself built from other modalities, and thus still be self-aware in other ways. Therefore, most of the animals failing the test are probably cases of “incomplete” self-awareness rather than a total absence of self-awareness. For instance, dogs have not been reported to pass the mirror test. Nevertheless, dogs certainly exhibit many self-aware behaviors. They can distinguish their own urine from others. They can sense what is happening to their body, and where on their body it is happening. They roughly know their body size and plan their action based on that understanding [5]. They also have a basic understanding of physical interaction between their body and objects in the world, and they can then control their body to solve problems [6].

Besides the fact that animals failing the mirror test can still be self-aware in other ways, more complex forms of self-awareness involving higher abstractions or meta-cognitions are probably not necessary for passing the test. Therefore, the mirror test should not be considered a definite and complete test for self-awareness. It is important to understand that self-awareness comes in various forms and degrees, and the mirror test only tests for a specific form of self-awareness.

. . .

Introspection, Meta-Cognition, and More

In the previous subsection, I discussed bodily self-awareness, a system’s ability to track, represent, understand, or model its own body. Nevertheless, we are still far from covering fully what we usually mean by self-awareness. Beyond being aware of one’s body, the more intriguing form of self-awareness concerns one’s own cognitive system, i.e., introspection, meta-cognition, and theories of one’s own mind.

Introspection refers to the act of examining one’s own mental or emotional process, and I think it falls under the more general idea of meta-cognition, which means cognitive processing of cognitive processes. Generally speaking, it would require a cognitive system to have access to the states or signals of some cognitive process, understand the relations between these signals, or even regulate the cognitive process.

Analogous to my discussion on bodily self-awareness, I think one’s knowledge about one’s mental process can also be roughly categorized into two forms: immediate awareness and general knowledge.

The former, immediate awareness, requires a system to track the immediate state of one’s cognitive system. For example, one does not simply think but is also aware that it is oneself that is thinking, possibly verbally announcing this fact. Similarly, one can be aware of one’s perceiving, knowing, or any other mental states or activities.

Besides immediate awareness, one can also learn some general properties of one’s cognitive system. As one observes oneself over a period, one can learn the underlying pattern in one’s thinking and behavior, such as the strengths and weaknesses, or the common themes, tendencies, or fundamental values underlying one’s thinking. In other words, one builds a model of one’s own cognitive process and behavior.

. . .

Speaking of introspection, perhaps we should also give some attention to emotion, an attribute that is often deemed special to human beings and cannot be possessed by machines. Emotion, abstractly defined, is a transient, temporary state that a system enters in response to an event, changing how the system thinks and acts during a short period. For example, fear, one of the most primal emotions in us, refers roughly to the state involving physiological changes like increased heart rate, dilated pupils, or tensed muscles, as well as changes in our cognitive processing such as heightened alertness for events happening around us. Since emotions involve both physiological responses in the body and changes in the cognitive process, understanding our own emotions requires integrating bodily awareness and meta-cognition. Our concept of emotion is then a high-level concept, a latent variable included in our self-model, that we use to explain these changes happening to our body and cognitive process.

Can machines have emotions? As most of us understand intuitively other people’s emotions based on their behavior, without understanding the biological implementation of such mechanisms, it makes sense to consider it an abstract concept separated from the underlying implementation. Further, anyone having an understanding of computation would know that an abstract mechanism like emotion can, in principle, be implemented with different physical systems and that human bodies are just a particular kind of machine implementing this pattern of behavior. Therefore, there are no fundamental reasons why machines cannot have emotions. Not only that, information access to cognitive processes responsible for emotion would also allow them to be aware of their own emotion.

. . .

Another necessary component for a complete understanding of self is understanding self as something that persists in time and has a history. This involves maintaining a database of events that occurred around oneself. To name a few, what happened to oneself, what one was thinking and feeling, what one did, where one was, and when each of these happened. In humans, this corresponds to episodic memory discussed in neuroscience. For future robot agents, episodic memory would also be a necessary component and I think it will be implemented by a kind of database storing abstract or reduced representations of particular events, similar to how humans mostly only store a summarized version of the occurred events.

. . .

So far, I have mentioned that full self-awareness involves modeling our own body as well as our mental process. Although these two forms of understanding seem different, I bet that they can all be implemented according to a common principle. As I have mentioned in the subsection Learning Self-Models, by applying a procedure similar to self-supervised learning, a model of the world can be built by modeling the relation between the multi-modal perceptions conveying information of the world. Similarly, a model of one’s own body can also be built by modeling the relations between multi-modal perceptions of oneself as well as the motor signals. Pushing this analogy one step further, one should also be able to build a model of one’s own cognitive processes by a similar process involving observing the signals of the cognitive processes, predicting them, verifying the prediction, and updating the model.

Further support of this idea can be found in predictive coding theory, which I also mentioned in the subsection Learning Self-Models. It is an attempt to explain how our brain builds models of the world from our sensory experiences, and it is conceptually similar to self-supervised learning. Although it was initially proposed to explain our visual perception, it is nevertheless extended to explain other mechanisms in the brain. For example, Anil K. Seth, a renowned neuroscientist, proposed “interoceptive predictive coding”, which states that “emotional content is determined by active inference on the likely internal and external causes of changes in the physiological condition of the body” [7]. This idea is pretty similar to my previous explanation that the categories of emotions that we intuitively use, are latent variables in our self-model. They are inferred from and used to explain (i.e., correctly predict) the changes occurring in the body and cognitive process.

Testing With Language

As we move from bodily self-awareness to introspection, meta-cognition, or mental self-models, it seems much harder to judge whether a system possesses these abilities using only observation of their non-linguistic behaviors. To my knowledge, the most established result in the study of animals' meta-cognition seems to be only that some animals such as macaques are aware of their uncertainty in decisions [8]. Besides this, there are probably no conclusive results on how rich animals' theory of their own mind is.

However, future AI will be capable of using language, and thus we can test their understanding of themselves by simply having a conversation with them, just like how we would probe a person’s understanding.

To test if an AI or human understands itself, we can ask it to describe itself in terms of personality, physical appearance, structure, abilities, values, and preferences, as well as its role in society, or more generally, its relations to the world. Then, we can observe its behavior to see if it is consistent with its description.

Having an adaptive self-model is important, and some systems may have a static self-model that can not handle unexpected changes in oneself. To test if the system can continually maintain an updated understanding of itself, it is necessary to test the system with dynamic, changing aspects of itself. These are questions that cannot be reliably answered without continual access to, and an understanding of, information about oneself. To name a few, if an AI did something, is it aware of its growing history, capable of recalling and reporting recently occurred events and actions? If an AI loses its components or is given new peripherals, can it update its self-model and behaviors to account for these changes? As an AI’s knowledge expands, can it reliably report what it knows and does not know? By carefully designing these questions and the corresponding verification, we should be able to probe how rich and adaptive a system’s self-model is.

. . .

Self-Awareness in AI Systems

Although there must still be aspects of self-awareness that I missed, I hope I have explained clearly enough what I mean by self-awareness. Now, let us focus on the particular case of self-awareness in AI systems.

Summarizing my previous discussions, for an AI system to be self-aware, it must at least satisfy one of these conditions:

It is trained with data containing information related to itself, which allows it to learn general knowledge of itself.
It has access to and can process updated information about itself during its operation.

Examining the first condition, I think the kind of pre-training done on current LLMs probably does not encourage self-awareness to emerge, even if the architecture allows so. The main reason is that the data are collected from the internet. These data are generated by different individuals, without a consistent “self” behind the generation or collection of the data. Therefore, no self-models can be learned from these data.

In contrast, how humans or most animals learn about themselves is quite different. From the moment of birth, we are constantly collecting data from each of our unique perspectives. Among the senses we have, many of them, such as visual, auditory, tactile, proprioceptive, and interoceptive perceptions, all contain information about ourselves. Combining these senses with representations of our own actions implemented with corollary discharges, we can then build a rich understanding of ourselves. This is very different from how we pre-train LLMs.

In the future, I think this problem can be solved by building agents that learn in an online and embodied fashion (or at least with a simulation of their body), collecting information about themselves as they perform actions, and building a self-model in the process.

. . .

Because of the lack of “self” in the pre-training data, the model trained from the data is like a general inference engine rather than an agent knowing itself. However, an agent can be built by adding components around this general engine, and it is possible to provide the model with information about itself so that it gains some self-awareness. This would make the system satisfy the second condition.

For example, an LLM can be given system prompts like “You are an AI assistant working for xxx. You try your best to be helpful and kind. …” to define its character. These prompts simultaneously define the agent’s behavior and give the agent information about its behavior. Besides the initial prompt, more information about itself can also be included in the process. For example, when generating responses, LLMs have access to the conversation history, which includes its previous outputs. Agents built using LLMs as a core can also have access to environmental states, their goals, and their memory of previous events, actions, and thoughts, all represented as text (see [9] for an example). As long as the model can understand these textual representations, the system as a whole literally knows “what it is doing”, reflect, and introspect.

. . .

In addition to the abovementioned ways to implement self-awareness, I think reinforcement learning can also be used to enhance the self-awareness of AI models, especially when the reward feedback comes from humans. The reason is people would not want an AI that did something without awareness of what it did. Just like a human, to interact smoothly with people, an AI needs to know its role in society and have a clear understanding of what it is, what it did, and its current state. Therefore, human feedback might reward the model toward being able to, say, attend to and understand information about itself. If the architecture allows the system to gather information about itself, we may expect it to become more self-aware as it interacts and receives feedback from people.

In addition to reward feedback, people’s descriptions of an AI can also contribute to an AI’s self-awareness, as long as it can learn from new interactions. Humans likely employ the same form of learning to understand themselves. There are studies showing that one’s self-knowledge can sometimes be inaccurate, while the peers' view can provide more accurate predictions about one’s behavior [10]. Since others observe our behavior and learn to summarize our behavior into personality traits, when they share their knowledge of ourselves, other people literally serve as our meta-cognition.

. . .

Before concluding this article, I want to share and comment on some experiments people have done on LLMs. These examples seem to show that they do have some simple self-awareness.

The Needle in the Haystack

Alex Albert, an employee of Anthropic, reported that when he was performing the needle-in-the-haystack test, which tests for a model’s ability to extract a small piece of information in a long text, Claude seemed to be aware that it is being tested. The tweet has made people rumor that it has become sentient or self-aware. I am just going to comment on self-awareness instead of the vaguer notion of sentience.

Although impressive, the fact that it is aware it is a test does not seem directly related to self-awareness. It is more like a context or situation awareness, being able to infer the larger context in which the text appears. Nevertheless, I think the model does show a behavior related to self-awareness: it can understand the role it plays in a conversation. The model can use the word “I” correctly, differentiating which part of the context comes from itself and which part comes from the other person. Thus, it knows that it is the “I” that is being tested.

This is not hard to achieve since the dialogue history probably has markings indicating the source of each response. Moreover, the model is probably prompted or finetuned to act as a conversational agent. Nevertheless, I am not saying “it is just pattern matching, not self-awareness”. Instead, in this example, it is exactly pattern matching that allows a system to differentiate between external and internal signals, implementing a basic form of self-awareness.

Mirror Test on LLMs

Besides the needle-in-the-haystack test, a more interesting example is the mirror test done by Josh Whiton. To simulate a mirror and see how the model would react, he repeatedly took screenshots of the conversation and asked the model to comment on it. Among the models he tested, many of them can gradually recognize that the screenshot is about their conversation, recognizing its role in the screenshot and expressing this discovery using words like “I”, “my”, and “me”.

This ability is not hard to explain. The visual module of the model probably can recognize and transcribe the texts in the image. Then, with the attention mechanism in transformer models, the model should be able to discover the correspondence between the text extracted from the image and the previous conversation. Does this mean that it is “just pattern matching” and not self-awareness?

This cross-modal matching is analogous to kinesthetic matching, an explanation of how animals recognize themselves in the mirror. When we perform actions, we are constantly aware of the state of our body through proprioception, telling us the position of our body parts, and corollary discharges, representing our motor commands. Through many experiments with the mirror, our brain can then discover the correlation between the visual feedback from the mirror and the action generated by ourselves. The model’s conversation history is analogous to the corollary discharge, the screenshot corresponds to the visual feedback, and the attention mechanism in the model is responsible for discovering the correspondence.

In conclusion, “pattern matching” and “self-awareness” are not mutually exclusive concepts. Instead, it is exactly pattern matching that “implements” the model’s self-awareness.

Conclusion

Of course, with the arguments above, I am not claiming that these models have full-fledged self-awareness. As I stressed at the beginning of this article, a system can have different amounts of self-knowledge, and it can possess self-awareness in some aspects but not others.

A general problem of current LLMs is that their understanding of themselves mostly comes from the system prompt or fine-tuning data, and both of these are crafted by humans. They cannot build and maintain a self-model on their own (e.g., they cannot realize that they are a language model without people telling them in the system prompt, fine-tuning data, or conversation). To do this they need to be trained on data that contains information about themselves. In other words, they need to be given flexible information access so they can obtain representations of themselves, and they have to be trained in an iterative or online fashion so that the self can become part of the data instead of just the product of the data.

A philosophical conclusion I reached from this reflection is that, yes, I believe it is possible to build a self-aware machine, and some of them are starting to show signs. Self-awareness, at first sight, sounds like a mystical concept so distant from machines. Nevertheless, it is likely nothing more than a particular form of information processing.

References

N. Nesher, G. Levy, F. W. Grasso, and B. Hochner, ‘Self-Recognition Mechanism between Skin and Suckers Prevents Octopus Arms from Interfering with Each Other’, Current Biology, vol. 24, no. 11, pp. 1271–1275, Jun. 2014, doi: 10.1016/j.cub.2014.04.024.
R. J. Clark and R. R. Jackson, ‘Self recognition in a jumping spider: Portia labiata females discriminate between their own draglines and those of conspecifics’, Ethology Ecology & Evolution, vol. 6, no. 3, pp. 371–375, Sep. 1994, doi: 10.1080/08927014.1994.9522987.
Y. Fukano and A. Yamawo, ‘Self-discrimination in the tendrils of the vine Cayratia japonica is mediated by physiological connection’, Proc Biol Sci, vol. 282, no. 1814, p. 20151379, Sep. 2015, doi: 10.1098/rspb.2015.1379.
J. Bongard, V. Zykov, and H. Lipson, ‘Resilient Machines Through Continuous Self-Modeling’, Science, vol. 314, no. 5802, pp. 1118–1121, Nov. 2006, doi: 10.1126/science.1133687.
R. Lenkei, T. Faragó, D. Kovács, B. Zsilák, and P. Pongrácz, ‘That dog won’t fit: body size awareness in dogs’, Anim Cogn, vol. 23, no. 2, pp. 337–350, Mar. 2020, doi: 10.1007/s10071-019-01337-3.
R. Lenkei, T. Faragó, B. Zsilák, and P. Pongrácz, ‘Dogs (Canis familiaris) recognize their own body as a physical obstacle’, Sci Rep, vol. 11, no. 1, p. 2761, Feb. 2021, doi: 10.1038/s41598-021-82309-x.
A. K. Seth, ‘Interoceptive inference, emotion, and the embodied self’, Trends in Cognitive Sciences, vol. 17, no. 11, pp. 565–573, Nov. 2013, doi: 10.1016/j.tics.2013.09.007.
J. D. Smith, ‘The study of animal metacognition’, Trends in Cognitive Sciences, vol. 13, no. 9, pp. 389–396, Sep. 2009, doi: 10.1016/j.tics.2009.06.009.
J. S. Park, J. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, ‘Generative Agents: Interactive Simulacra of Human Behavior’, in Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, San Francisco CA USA: ACM, Oct. 2023, pp. 1–22. doi: 10.1145/3586183.3606763.
T. D. Wilson and E. W. Dunn, ‘Self-Knowledge: Its Limits, Value, and Potential for Improvement’, Annu. Rev. Psychol., vol. 55, no. 1, pp. 493–518, Feb. 2004, doi: 10.1146/annurev.psych.55.090902.141954.

Self-Awareness and AI: A Deep Dive

. . .

Table of Contents

. . .

The Simplest Form of Self-Awareness

Self-Knowledge

Bodily Self-Awareness

Learning Self-Models

The Mirror Test

. . .

Introspection, Meta-Cognition, and More

. . .

. . .

. . .

Testing With Language

. . .

Self-Awareness in AI Systems

. . .

. . .

. . .

The Needle in the Haystack

Mirror Test on LLMs

Conclusion