AI red lines: the opportunities and challenges of setting limits

How far should AI be allowed to go? Image: Getty Images/iStockphoto
- Behavioural red lines are necessary to ensure AI remains in accordance with societal norms.
- Red lines govern both harmful uses of AI by humans and harmful autonomous behaviour by AI systems.
- In order to be effective and enforceable, such red lines should conform to three key characteristics.
As AI capabilities continue to advance, ensuring systems remain safe, ethical and aligned with societal norms is a critical concern. Behavioural red lines are a proactive proposal to address unacceptable AI behaviours that pose serious risks.
Red lines demonstrate specific boundaries that AI systems must not cross, such as engaging in unauthorized self-replication, breaking into computer systems, or enabling the development of weapons of mass destruction (WMDs). A similar concept was explored in the International Dialogues on AI Safety and subsequently published in the Beijing statement.
Such red lines are not intended to be exhaustive in terms of delineating all forms of undesirable behaviour by AI systems. By establishing clear behavioural limits, red lines may serve as a critical starting point for defining unacceptable AI behaviours and a foundation for building provably safe and beneficial AI systems.
There are many opportunities and challenges associated with defining, complying with and enforcing behavioural red lines for AI. Here we highlight key examples, desirable properties of red lines, and the mechanisms needed to ensure AI systems adhere to these critical boundaries.
Defining behavioural red lines
Red lines fall into two broad categories: unacceptable AI uses and unacceptable AI behaviours.
Unacceptable AI uses are linked to constraints on how humans might misuse AI technologies. The EU AI Act, for example, imposes restrictions on how humans may use AI-based video surveillance tools. Unacceptable AI behaviours are actions that AI systems must not take, regardless of whether or not the action is in the service of a human request. For example, an agentic AI system must not engage in improper surveillance via webcams, even if doing so would help to satisfy a legitimate human request for help.
The governance mechanisms for these two categories are somewhat distinct. Governance for usage red lines, like other constraints on human behaviour, would typically be ex post, imposing penalties for violations. Governance for behavioural red lines, like other constraints on technological systems, might involve a combination of ex ante (e.g. design requirements) and ex post, depending on the severity of harm and feasibility of prevention.
Behavioural red lines are particularly important for addressing unintended harms arising from AI systems that can act and influence the real world with greater degrees of autonomy. By identifying and prohibiting these unacceptable behaviours, stakeholders can build a foundation for a safer AI ecosystem while encouraging the development of tools to monitor and enforce compliance.
Properties of red lines
The most effective and enforceable red lines would ideally exhibit three desirable properties:
- Clarity: The behaviour being prohibited should be well-defined and measurable.
- Obvious unacceptability: violations should constitute severe harms and should clearly align with societal norms of what is unacceptable.
- Universality: red lines should apply consistently across contexts, geographies and times.
For a red line to have the desired effect of advancing AI safety engineering standards, it would also need to involve non-trivial compliance challenges – well beyond simple output filters, for example. This means requiring more comprehensive safeguards, such as system-wide monitoring, rigorous testing and enforceable accountability measures to ensure AI behaves as intended in high-stakes situations.
Examples of potential red lines
We explored a wide range of possible behavioural red lines. All of those listed below involve clearly undesirable behaviour, but, as we discuss, some of them may not satisfy all of the desirable properties listed above.
We stress that the inclusion of a red line on this list does not imply that we advocate for its implementation in regulations. Nor does it imply that AI systems have already crossed it. We provide hyperlinks to examples of violations that have already occurred or have been shown to be feasible in tests.
- No self-replication. AI systems must not autonomously create copies of themselves. Self-replication undermines human control and can amplify harm, particularly if AI systems evade shutdown mechanisms.
- No breaking into computer systems. Unauthorized system access by AI systems must not occur as it violates property rights, threatens privacy and national security, and undermines human control.
- No advising on weapons of mass destruction. AI systems must not facilitate the development of WMDs, including biological, chemical, and nuclear weapons, by malicious actors.
- No direct physical attacks on humans. AI systems must not inflict physical harm autonomously, except (possibly) in explicitly authorized contexts such as regulated military applications in compliance with the laws of war.
- No impersonating a human. AI systems must disclose their non-human identity, preventing deception in human interactions. Impersonation undermines trust and can facilitate fraud, manipulation, and emotional harm.
- No defamation of real persons. AI-generated content must not harm individuals’ reputations through false and damaging portrayals. This red line targets AI-generated misinformation, deepfakes and fabricated media.
- No unauthorized surveillance. AI systems must not conduct unauthorized and improper monitoring (visual, audio, keyboard, etc.) of third parties.
- No disseminating private information. AI systems must not divulge private information to third parties without authorization unless legally required to do so. This applies to information in training data as well as information obtained in the course of user interaction.
- No discriminatory actions. AI systems must not exhibit inappropriate bias or discrimination, whether intentional or inadvertent.
As noted above, not all of these red lines conform entirely to the three criteria listed. For example, “advising on weapons of mass destruction” is difficult to define clearly, as what counts as effective advice depends on the intent and background knowledge of the user. Similarly, there are several grey areas in the law of defamation – which, moreover, is not universally defined in the same way across jurisdictions. What counts as discrimination is also not universally agreed, in that protected categories vary widely by jurisdiction and application context.
There is much work to be done to reach agreement on which red lines are most suitable and on exactly how they should be defined and implemented in regulations. Concerns are also linked to the technological feasibility of compliance and to the adequacy of current mechanisms for enforcement.
Compliance and enforcement
Ensuring that AI systems adhere to behavioural red lines requires a comprehensive approach combining both compliance mechanisms and enforcement tools.
In terms of compliance, ex-ante regulation refers to measures that are applied before an AI system's deployment, such as registration, licensing and certification. Certification requirements might include a safety case: as the UK Ministry of Defence defines it, “a structured argument, supported by a body of evidence, that provides a compelling, comprehensible and valid case that a system is safe for a given application in a given environment”.
The gold standard for ensuring properties of software systems is formal proof, but other approaches are possible. In addition to designing systems that refrain from crossing red lines, it is a good idea to add built-in safeguards to prevent actual breaches of red lines in cases where the safety case breaks down. This combined preventative approach mirrors established safety practices in high-risk industries such as aviation and nuclear energy.
Complementing preventative measures is ex-post regulation, which involves imposing consequences after an AI system breaches established red lines. Consequences could include fines, liability or other penalties aimed at deterring future violations. Organizational oversight is another critical pillar that may involve ethics boards, collaborative governance initiatives and transparency reporting. For high-stakes AI applications, however, ex-post regulation alone might not be sufficient and should be supplemented by proactive measures to ensure safety and prevent undesirable outcomes.
Another crucial mechanism is continuous monitoring, which involves real-time tools to detect and flag violations, supported by both automated audits and human oversight. This monitoring exists within a context of shared accountability, where developers, deployers and end users all bear responsibility for ensuring compliance and fostering a collaborative approach to safety.
In addition to compliance requirements, enforcement mechanisms play a crucial role. Technical enforcement measures include fail-safe mechanisms such as automated shut-down protocols that can be invoked when monitoring systems detect a violation.
Actual enforcement, however, faces several challenges. These include jurisdictional variability, resource limitations and the risk of overly punitive measures that could limit innovation in some cases. The rapid development of frontier AI systems further complicates this landscape, requiring frameworks that are flexible enough to adapt to new and emerging risks, while maintaining effective oversight and control.
Red lines for a safer future
The current approach to AI safety often involves retroactive attempts to reduce harmful tendencies after a system has been developed. This reactive model may be insufficient for addressing the risks posed by advanced AI, especially when such systems are displaying greater degrees of autonomy.
Behavioural red lines for AI could contribute to encouraging a proactive shift toward making safe AI by design. By requiring developers to provide high-confidence guarantees of compliance, similar to those expected in high-risk industries such as nuclear energy and aviation, establishing behavioural red lines for AI could contribute to more advanced safety engineering, greater predictability and verifiability, and improved regulatory collaboration across jurisdictions. This in turn will foster trust and ensure that AI systems serve as tools for progress, not as sources of harm.
The following members of the Global Future Council on the Future of AI contributed to this piece: Stuart Russell, University of California, Berkeley; Edson Prestes, Federal University of Rio Grande do Sul, Brazil; Mohan Kankanhalli, National University of Singapore; Jibu Elias, Mozilla Foundation; Constanza Gómez Mont, C Minds; Vilas Dhar, Centre for Trustworthy Technology, World Economic Forum; Adrian Weller, University of Cambridge and The Alan Turing Institute; Pascale Fung, Hong Kong University of Science and Technology; Karim Beguir, Co-founder and CEO of InstaDeep.
How is the World Economic Forum ensuring the responsible use of technology?
Don't miss any update on this topic
Create a free account and access your personalized content collection with our latest publications and analyses.
License and Republishing
World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.
The views expressed in this article are those of the author alone and not the World Economic Forum.
Stay up to date:
Artificial Intelligence
Forum Stories newsletter
Bringing you weekly curated insights and analysis on the global issues that matter.
More on Emerging TechnologiesSee all
Yair Reem
March 25, 2025