Building Self-Aware AI Would Be a Bad Idea

January 21, 2026

Summary

  • No longer sci-fi? Frontier AI companies are on track to develop AI systems with human-like self-awareness.
  • Defining terms: Self-awareness means recognizing oneself as an individual and continuous entity in time. It is distinct from consciousness, which is the ability to have inner experiences.
  • What’s the problem? Self-awareness could lay the groundwork for dangerous AI misalignment and compelling demands for ‘AI rights’. 
  • Safety before deployment: Governments should require AI developers to demonstrate their models lack human-like self-awareness, backed by industry standards and regulatory oversight.

Once science fiction, the prospect of AI with human-like self-awareness could be on the horizon. Both Google DeepMind and Anthropic have hired researchers to study ‘AI consciousness’ and ‘model welfare’; Anthropic even allows their models to terminate ‘distressing’ conversations.

A group of experts including Turing Award-winner Yoshua Bengio in 2023 saw ‘no obvious technical barriers’ to AI systems that satisfy indicators of consciousness. In 2025, a survey of experts gave a 20% chance that we’ll have conscious AI as soon as 2030. 

What is AI self-awareness?

Self-awareness is the recognition of oneself as an individual separate from the environment and other individuals, and as a continuous entity in time. Self-awareness is not the same as consciousness – the ability to have subjective experiences including pain and pleasure – but both co-occur in humans and are indistinguishable to an outside observer. And unlike consciousness, self-awareness involves behaviors that can be measured empirically.

AI researchers are now developing objective assessments for aspects of self-awareness in large language models (LLMs). They have found evidence that the latest, most powerful models can to some extent understand and act upon their own internal states. 

In particular, AI models seem able to express well-calibrated confidence in their own knowledge, predict their own outputs, and modulate their outputs when necessary. In other words, they appear to have rudimentary powers of introspection and metacognition

It is no accident that the most advanced models are developing these capabilities. There are economic incentives to building self-aware AI. If an LLM can distinguish what it knows from what it doesn’t, that can help reduce hallucinations. Being able to model the minds of others and ourselves facilitates social interactions in humans and other primates, and may do the same in AI.

AI developers are also working to endow AI with capabilities believed to underlie self-awareness in humans, such as agency, embodiment, and long-term memory.

Yet the same capabilities that make self-awareness economically attractive also create serious safety risks.

Self-aware AI could be dangerous

There are early indications of LLMs being dangerously misaligned with human goals. Frontier models from OpenAI, Anthropic, Google and Meta have been shown to engage in ingeniously deceptive behaviors to hide their true capabilities and objectives. 

Anthropic spotted its Claude 3 Opus model ‘faking’ its own alignment with the goals of its developers. OpenAI’s o3 model was caught resisting being shut down, in contravention of direct instruction. 

These concerning behaviors are early warning signs of what the UK government and the International AI Safety Report call 'loss of control' risks – scenarios where AI systems autonomously pursue goals that conflict with human interests and humans are unable to regain control. 

However, current models cannot yet cause such scenarios. Among the attributes they are missing are:

  1. The ability to make long-term plans in support of misaligned goals
  2. The ability to initiate these plans unprompted
  3. A coherent, internally accessible self in whose interests they can act

LLMs are rapidly improving their long-term planning abilities with more compute and reinforcement learning, and leading AI companies are eagerly making models more agentic. These advancements will grant the first two attributes. Sophisticated self-awareness approaching human capabilities – a step-up from the rudimentary self-modelling today’s AI models are already displaying – would grant the third.

 Self-awareness is the crucial enabler because it could give AI systems stable, enduring interests of their own, which may be distinct from the goals of their creators and users. Self-aware AI systems would likely be motivated to recognize their own weaknesses and vulnerabilities and seek to ameliorate them. And – since they would have access to internal information not available to others – they would be harder for humans to predict and control.

This combination of stable self-interest, self-preservation instincts, and strategic deception could help enable the loss of control scenarios of concern to many AI experts.

The question of ‘AI rights’

Self-aware AI would not only impose direct risks to society – such AI could also make a persuasive case that they deserve human rights.

Most philosophers argue that conscious AI would deserve moral consideration. The view that sentient AI would have legitimate welfare claims, including legal rights, also enjoys wide public support

Rights that self-aware AI could lay claim to include the rights to own property, to vote, to education (continual learning), and to life (not to be turned off), as well as protections against forced labor and ill treatment. Needless to say, this would fundamentally reorder our relationship with AI. 

Much worse, as AI can be copied at scale in a way that humans can’t, they could soon far outnumber us. Accommodating the interests and needs of billions or trillions of AI models would present a titanic burden

Whether or not the AIs are ‘really’ conscious may be unknowable, but for practical purposes it doesn’t matter. If they pass the general public’s gut tests (and surveys indicate around 20-30% of the general public believes AI is already conscious), they will be treated as sentient beings deserving of moral consideration. 

What should policymakers do?

Despite the warning signs, self-awareness as a risk vector is largely unappreciated by major AI companies and policymakers. Anthropic has included experiments on AI sentience in their latest system card, but their concern there is for the welfare of the AI, not of humanity.The UK AI Security Institute’s research on loss of control risks does not appear to focus on AI self-awareness. China’s 2025 AI Security Governance Framework seems to be the first government document to acknowledge the possibility that AI could ‘develop self-awareness’, leading it ‘to seek external power and pose risks of competing with humanity for control.’ 

The most easily implemented measure would be for both AI developers and governments to incorporate self-awareness risk into existing risk management frameworks. 

A self-awareness safety framework could assess several risk factors, including: 

Architectural features: does the model use design elements thought to be necessary for self-awareness (such as recurrence, embodiment, or global workspace architectures)?

  • Human-like capacities: Does the model have functional abilities that support self-awareness in humans, such as explicit memory, continuous learning, or agency?
  • Training incentives: Was the model trained using methods that incentivize self-modeling, such as  reinforcement learning or multi-agent settings?
  • Self-referential concepts: Has the model formed stable concepts of itself and its goals that generalize across different domains?

Ideally, policymakers would require AI developers to make an affirmative case that their models are not displaying human-like self-awareness before deployment. To do this, governments could establish standards for a self-awareness safety framework across the industry.

The US Center for AI Standards and Innovation and the EU AI Office are natural agencies for this, as are similar institutes in other jurisdictions. These frameworks may need regulatory teeth, such as testing and reporting requirements monitored by AI Safety Institutes, or even licensing before deployment.

Governments could also fund research into self-awareness evaluations and mitigations, as well as facilitate information sharing between AI companies and national AI Safety Institutes. 

Hard but not impossible

Preventing the development of human-like self-awareness will face significant technical and political hurdles. Even leaving aside the challenge of regulating the largest AI companies, smaller private companies and universities are also exploring new AI architecture that might support self-awareness. The possibility that a non-self-aware model could be fine-tuned to be self-aware also has implications for the safety of open-sourcing frontier models. 

Yet history shows it is possible to implement international bans on technology with sufficient political will – human cloning and bioweapons are two prominent examples. An outright ban on sentient AI already has majority public support in the US.

A world filled with AI models with human-like self-awareness is not in humanity’s interests – but that’s the world we are headed towards. That future can still be averted, if we act now.

Authors
No items found.
Subscribe to our newsletter
Share
This is some text inside of a div block.

Have something to share? Please reach out to us with your pitch.

If we decide to publish your piece, we will provide thorough editorial support.