The safety conversation in enterprise AI has focused predominantly on two things: whether AI systems want to do the right thing, which is the alignment problem, and whether they can be prevented from saying the wrong thing, which is the output filtering problem. Constraint architecture is a different question entirely. Regardless of what an AI system wants or outputs, what can it actually do?

For most current enterprise AI deployments, this question has a reassuring answer. An AI assistant that can only respond to text cannot order stock. A model with no database access cannot modify records. A system constrained to a specific tool set operates within the limits that tool set imposes. The safety is architectural, implicit in the design, and largely unplanned.

Agentic systems change this. Agents are designed to act. They are given tools because the value of an agent lies in its capacity to take actions that would otherwise require human effort. Giving an agent tools removes the implicit constraints that made earlier AI deployments safe by default. The safety architecture needs to become explicit, because the architectural safety of inaction is no longer available.

Three Types of Constraints

Not all constraints are equivalent in what they guarantee. Understanding the difference is essential to building constraint architecture that provides real assurance rather than the appearance of it.

Hard constraints are actions the system cannot take regardless of instructions, context, or apparent good reason. They are implemented at the tool level, not the model level. A payment tool that rejects transactions above a defined threshold regardless of the instruction it receives is a hard constraint. It cannot be overridden by an instruction in the model's context. It cannot be bypassed by a prompt injection attack. It cannot be disabled by a reasoning chain that concludes the override is justified. It simply does not execute the disallowed action.

Soft constraints are actions the system is instructed not to take. They are implemented as rules in the model's context, in the system prompt, or in the model's fine-tuning. They are meaningfully better than no constraints. They are also bypassable, either by adversarial input that constructs a context in which the constraint appears not to apply, by model error in edge cases the constraint did not anticipate, or by prompt injection that overwrites or neutralises the constraint.

Approval gates are actions the system can propose but cannot take without confirmation from a human or a designated authority. They are appropriate for high-consequence, low-frequency decisions: actions that may be correct and should be executed, but that carry enough potential for harm that human review before execution is warranted. They preserve operational efficiency while maintaining human oversight at the consequential decision points.

A hard constraint cannot be overridden by an instruction, bypassed by a prompt injection attack, or disabled by a reasoning chain that concludes the override is justified. It simply does not execute the disallowed action.

Why Current Deployments Use the Wrong Constraint Type

Most current enterprise agentic deployments rely primarily on soft constraints. The system prompt includes instructions about what the agent should and should not do. These instructions are drafted carefully, reviewed by the legal and governance teams, and treated as the primary safety control for the system.

This is a category error. Soft constraints are a model-level control. They depend on the model interpreting the constraint correctly, applying it consistently across a large input distribution, and not being manipulated into reasoning its way around it. None of these properties can be guaranteed. The constraint is as reliable as the model, which means it is as unreliable as the model in precisely the circumstances where constraint reliability matters most: adversarial inputs, edge cases, and novel situations the constraint authors did not anticipate.

Hard constraints are a tool-level control. They do not depend on model behaviour. Their reliability is determined by the implementation of the tool, which is deterministic and auditable in ways that model behaviour is not. The governance implication is significant: if your primary safety controls are at the model level, your safety case rests on the reliability of a probabilistic system. If your primary safety controls are at the tool level, your safety case rests on the reliability of deterministic code that can be verified and tested.

The Implementation Challenge

Moving primary safety controls from the model level to the tool level requires that the governance function be involved in tool design. This is where most current AI governance frameworks fall short. The frameworks focus on model evaluation: what the model can and cannot do, how it performs against safety benchmarks, whether it has been red-teamed. The tools the model has access to are treated as an engineering concern.

This is backwards. The model's output is probabilistic and imperfect. The tools are deterministic. Implementing hard constraints at the tool level gives you guarantees that implementing them at the model level cannot. A governance programme that evaluates the model extensively and does not evaluate the tool access constraints is securing the wrong thing.

Practically, this means the governance function needs to participate in defining the action space of each agent: what tools it has access to, what those tools can do at their limit, and what the hard constraint layer looks like. This is a design conversation, not a post-hoc review. It needs to happen before the tools are built, not after the agent is deployed.

Multi-Agent Constraint Propagation

Multi-agent systems introduce a specific constraint problem that single-agent architectures do not face. An agent can instruct another agent to take an action that the first agent could not take directly. If Agent A cannot transfer funds but can instruct Agent B, which can, the hard constraint on Agent A is not a constraint on the system. It is a constraint on one path through the system that can be routed around.

Constraint architecture for multi-agent systems therefore requires analysis of what each agent can accomplish indirectly, through instruction of other agents, not just directly through its own tool access. The relevant question is not "what can Agent A do?" but "what outcomes can Agent A achieve, accounting for all the agents it can instruct and all the tools those agents have access to?"

This analysis is more complex than single-agent constraint design, but it is tractable. It requires a map of the agent interaction graph, the tool access of each agent, and the composition of actions that could lead to outcomes outside the intended operational envelope. Conducting this analysis before deployment is feasible. Conducting it after a constraint violation has occurred in production is significantly more difficult.

Decision Traceability as a Constraint Audit Mechanism

Constraint architecture does not eliminate the need for audit trails. It makes them more tractable. If you have defined the action space of each agent and implemented hard constraints at the tool level, you have also defined in advance what a complete audit trail needs to capture: the sequence of agent actions, the tool calls made, the inputs and outputs of each call, and any instances where a proposed action was blocked by a hard constraint or held at an approval gate.

An audit trail designed around a constrained action space is a more useful governance artefact than an audit trail designed to capture everything a system did. It has a defined scope. It has a defined vocabulary. And it makes it possible to ask: did the system behave within its constraint architecture, and if not, where and why?

What Boards Should Require

Before any agentic system reaches production, boards and governance functions should require four things. A documented action space: what can each agent in this system do, both directly and through instruction of other agents? An explicit constraint architecture: which actions require hard constraints, which soft constraints, and which approval gates, and how was that determination made? Evidence that hard constraints have been tested against adversarial inputs and cannot be bypassed through instruction alone. And a defined audit trail structure that captures constraint compliance as a first-class concern.

This is not onerous governance. It is the minimum viable assurance framework for systems designed to act autonomously in consequential domains. The organisations that establish these requirements before deployment avoid the far more difficult problem of establishing them after something has gone wrong.