The Outer loop

With the advent of Copilots and Agentic coding, a lot of the industries focus has been on capturing and optimizing the pure Development effort performed by developers. This is highly successful and is innovating at a stunning rate. What has been left out is how the outer loop need to change.

Why the outer loop matters

The outer loop has been invented (for lack of a better word) to achieve the following:

“Protect production systems against the introduction of human failure”

It contains all the process and automation we throw at verifying code/configuration/change created by humans and apply quality gates and approvals to manage the flow of change. This provides essential friction that we as an industry deemed necessary to deploy to production responsibly.

Main assumptions

This notion of responsibly is build on three main assumptions:

Code/configuration/change is created and committed by a human actor
It will be reviewed by a human that understands the context and intent of change
We work in small increments to limit scope and be very precise and descriptive in defining the change

(there may be quite a few more, but this is what I came up with)

In my opinion all three assumptions are voided with the introduction of Agentic coding. We can easily see that Agentic code generation and reviews do not share the human contextual understanding and possibly intent and judgment that we uniquely can demonstrate. Next to that, anyone working with a copilot or agent to develop a piece of software or perform task has seen the tremendous mountain of change that is created from just a few instructions. This crushes the “work in the small” mantra that we have been pursuing for decades.

Lets be clear though, in no-way-shape-or-form does this mean the SDLC breaks down because of “generative sloppiness”. It breaks down because controls are missing.

Need for change, new controls

The genie cannot be put back in the bottle, so we have to deal with the effects Agentic coding introduces in our outer loop and to still uphold the value of:

“Protect production systems against the introduction of failure”

Requirements and design

We need to expand our notion of requirements and design by adding Agent constraints, basically specifying : “What may be done by the agent to achieve the goal”. This can be captured in an Agent Charter, a file that it can read as part of the instructions. By adding the charter, we constrain the possible outcomes and make it easier to verify correctness of intent. We then get to the following approach:

Requirements describe functional behavior
Architecture and design describes mostly non-functional aspects (system qualities)
The Agent charter describes:
- Agent intent boundaries, to limit what it is allowed to change,
- Tool constraints and patterns for change (like always use feature a flag)
- Blast-radius-limits by specifying a set of resources(scopes, libraries, environments) it is allowed to change.

Implementation

The implementation is mostly inner-loop and outside the scope of what I’m trying to capture here. But, we do want to capture something specific about the implementation step to be able to change the way we (can) verify created outcomes. Today we assume code ownership and thus how (Git) history is created is always performed by humans. Next to that, we assume that verification of a PR comes with enough context to infer intent based on PR description and individual commit messages.

With Agents it is less important to ask the question: “Who wrote this?” then to ask “Why did this change?”. The broader idea is to capture “provenance of change”. That will allow us to reason in hindsight about how an agent came to make a specific change or contribution. This has to come in a format that is both human and machine readable.

Code review

For code review we have relied on the notion of understanding of a shared context and plan for change, (automated) tests and the ability to inspect working software. With the introduction of agents, there is more to validate, especially because test do not capture for example:

Incorrect assumptions
Misaligned intent
Over-eager refactoring

This is where human review needs to shift to semantic correctness. This can be done using the intended change using the Agent charter and inspecting the actual diff. This emphasizes correctness of the decisions Agents made, not so much the individual lines of code that have changed.

Testing

As we’ve seen Agents are very good at writing the tests themselves, but with the enormous change of pace in which new work is created and the volume of change also rising rapidly, we can no longer just rely on our ““traditional testing setup”” in which we have combined tests for Unit + Integration + Regression, etc. to validate correctness (BTW using “correctness” loosely here, recently watched an interview with E W Dijkstra that nicely reminded us we cannot prove the absence of errors, only the presence :-/ )

We must extend these capabilities with the following:

A baseline test that captures aspects of the system that may not change. these can be class structures, libraries used or even specific technology references, patterns, etc.
Contract test between systems (these are getting more popular already) to catch drift between distinct pieces of the system
(Potentially) add a canary tests that compares previous log/debug/screen output with current to detect undesired change

The goal of these is to detect alignment/containment of intended change and detecting unintended behavior early, not finding bugs.

Deployment and Release

Where the rubber-meets-the-road is in deployment of changes (to production). Here we rely on evidence that is generated and captured along the way to be able to judge if a change is ready to be deployed/released in production. However, when a large chunk of that evidence is now generated by Agents, are we then really sure the safety concerns are addressed properly? We do not want to rely on false-confidence (as agents often instill as part of their output).

This is where we can change a couple of things:

Introduce Environment scoped autonomy; an Agent has more ““freedom to act/operate” in low risk environments, maybe even none in production
The human approach to checking need to change from evidence based to risk-based
As part of the Agent charter we may introduce mandatory feature flags or evidence of roll-back tests, to guard safe deployment practices

Operations

As mentioned above, Agents need to be limited based on the environment they’re interacting with. But it’s not hard to imagine (and see the value !) of agents watching over production systems. Ultimately things in production will break so this is also an area where the analysis capability of an agent shines and can help gather evidence of what’s gone South quickly, without stress and carefully.

To be able to do this safely, first and foremost transparency of actions must be present. Second, I’d say we must have the ability to contain or limit the actions of an Agent in near-real-time. Once these are in place we can consider agents as part of full post-incident-review process.

Not everything needs to change

Especially the things that have been serving us well and provide natural friction that is beneficial. Things like separation of duties, four-eyes principles (and no, Agents do not have eyes !!), Just-in-time access controls and approvals, and finally auditability are key traits of the outer loop that must remain to move forward responsibly (imho).

Closing thoughts

I want to emphasize that this is in no-way definitive guidance but some initial thinking on my side on how the outer loop can reasonably change to accommodate Agentic coding. From my perspective, humans should stay in control over intent, scope and risk in the SDLC.

Let me know what you think.

Some of the things I've learned

And I think are worthwhile to share.

Agentic coding voids our outer loop assumptions