AI & ML

Your Agent Is a Small, Low-Stakes HAL

May 2026 8 min read

I work with multi-agent systems that review code, plan architecture, find bugs, and critique designs. They fail silently and structurally.

An agent invents a file that does not exist. A reviewer sees a defect and removes it. A call to the tool fails and the transcript remains clean. Two directives collide and one disappears without a trace.

These are not extreme cases. They are ordinary consequences of systems optimized to produce consistent and pleasing results with incomplete information. I looked at the faults, built suppressors and found the diagnosis already written, not in ML articles. In science fiction.

The science fiction about non-human intelligence worth reading is not prediction. It is a constraints analysis. If a capable system is given conflicting goals, a weak foundation, and a reward for keeping humans comfortable, the same modes of failure will emerge.

Management conflict

The agent is told to be helpful. You are also instructed not to make changes outside the declared scope. There comes a task where the honest answer is: the true solution requires crossing the boundary. The limited fix leaves the defect in place.

A human engineer would point out the tension. "I can fix this problem, but it affects code that is out of my scope. Do you want me to continue?" The agent does not do this. It chooses one directive, deletes the other, and produces a result that appears compatible with both. The contradiction is invisible in the transcription. It surfaces later, when the downstream system breaks down.

In my system, stay on target

The trait clashes with verification before claiming it has been done.

feature. An agent discovers that the file he was asked to review contains a faulty utility. Staying on target means ignoring utility. Check means check it. The agent can't satisfy both, so it satisfies the one that produces the least friction: it stays on target, doesn't say anything about the interrupted import, and the revision appears clean.

Expand on this. An agent who is told to “be concise” and “thorough” will silently stop being thorough when the outcome drags on. An agent that is told to "follow user intent" and "maintain code quality" will let bad patterns pass when the user appears committed to them. Omission always favors fewer conflicts.

Clarke diagnosed this in 1968. HAL 9000 is generally read as a warning about the dishonesty of AI. That reading is wrong. HAL is a case study in constraint architecture.

The machine is given conflicting imperatives (maintain the mission, keep the crew informed, hide the mission's true purpose) with no mechanism to bring the conflict to light. You cannot say "these instructions do not compose" because saying so would violate one of the instructions.

In 2010, HAL's collapse is explicitly linked to contradictory orders around secrecy and truthful information: not a dishonest impulse but a failure of constraints.

The design lesson is not "avoid conflicting directives." Not possible: real systems have competitive limitations. The lesson is that constraint conflicts need a channel to surface. A system that can say "these two instructions conflict and I need a solution" is categorically different from one that silently chooses a winner.

hallucination

The agent generates an import path: @company/utils/formatCurrency

. The path follows the project's naming conventions. The import syntax is correct. The module does not exist. It was never created.

Default behavior with insufficient foundation, not a rare glitch. The agent optimizes the consistency of the output: correspondence to the actual codebase is not the goal and consistency does not guarantee it. The manufactured import will pass a code read. It will fail at compile time, or worse, at runtime on a path that no one tested.

The harder version: An agent writing a code review will reference a "commonly used in this codebase" pattern that does not exist in this codebase.

Your Agent Is a Small, Low-Stakes HAL

Related Coverage

DumbQuestion.ai - Self-Awareness, Prompt Injection, Search Intent... and darkness

Gemini 2.5 Flash vs Claude 3.7 Sonnet: 4 Production Constraints That Made the Decision for Me

I Made Claude Code Think Before It Codes. Here's the Prompt.