The Two-Environment Workflow: How We Actually Build Features with AI

Everyone who has used AI for coding has had the same experience: you describe what you want, the AI writes something, it mostly works, you ask for a fix, it breaks something else, you ask for another fix, and twenty minutes later you are three layers deep in a conversation that has lost all context of what you were originally trying to do.

That was how I started. It is not how I work now.

Over the course of building Atmos Football — a React and Firebase app that tracks Elo ratings, fitness data, and team generation for a weekly 5-a-side group — I landed on a workflow that fixed most of those problems. The core idea is simple: separate thinking from doing. Plan in one place, build in another, and never let the two mix.

Two environments, one developer

The workflow uses two AI environments with completely different jobs.

The first is a browser-based chat interface. This is where planning happens. It holds a bundled copy of the entire source code, all the project documentation, the feature backlog, design decisions from past sessions, historical bugs and how they were fixed, and a set of conventions for how the project is structured. When I need a new feature or want to fix a bug, I describe what I want and the AI produces a plan document — a structured breakdown of what needs to change, which files are affected, what the test criteria are, and what could go wrong.

The second is an AI coding tool integrated into VS Code. This is where implementation happens. It reads and writes actual source files, runs the build, executes tests, and produces commits. It works from the plan document produced by the first environment, executing tasks one at a time.

The developer — me — sits between them. I upload code snapshots to the planning environment. I review and correct plans before they go anywhere near the codebase. I transfer approved plans to the implementation environment. I test the results on a staging server before promoting anything to production.

The AI never decides what to build. It proposes, I decide. The AI never deploys to production. It builds and tests, I promote. The separation of environments reinforces this structurally: the planning environment physically cannot modify code, and the implementation environment does not make architectural decisions.

Why the plan document matters more than the code

The plan document format is standardised. Every plan has a summary, a root cause or design section, numbered tasks with checkboxes, a "files changed" table, and a smoke test checklist. This sounds like bureaucracy. It is the single most important thing in the workflow.

When plans are precise — "in file X, replace this exact block with this exact block, because Y" — the implementation phase is almost mechanical. The AI executes the tasks, runs the build, runs the tests, commits. When plans are vague — "refactor the auth flow to be cleaner" — the implementation quality drops immediately. The AI makes well-intentioned but architecturally wrong changes, and you spend more time fixing the fix than you would have spent writing the code yourself.

The discipline is: no code is written without a reviewed plan. If the plan does not exist, I write one (or ask the AI to draft one) before any implementation begins. If the plan is wrong, I correct it with specific, numbered points — "point 3 should read X instead of Y, and point 7 is missing the edge case where Z." The AI responds well to the same feedback style you would use in a code review. Telling it "this is not quite right" produces poor results. Telling it exactly what is wrong produces excellent results.

What the planning environment actually knows

The planning environment is not working from a vague conversation. It has structured context: a full source bundle (every file in the project concatenated into one searchable text file), a conventions document describing how features are tracked and how plans should be formatted, a troubleshooting file documenting every auth failure, Firestore cost trap, and CSP issue encountered across dozens of versions, and persistent memory of past sessions.

This context is what makes the plans useful. Without it, the AI produces generic advice. With it, the AI produces plans that reference actual file paths, actual function names, actual patterns already established in the codebase. The quality of the output is directly proportional to the quality of the input context. This is always true with AI, but it is especially visible in a workflow where the output is a formal plan document rather than a conversational suggestion.

Automated verification changes everything

The implementation environment runs the build after every change and runs the test suite after any logic change. This matters more than it sounds.

Before automated tests, the feedback loop was: change code, build, deploy to staging, manually verify, describe the problem to the AI, fix, rebuild. After tests, the loop is: change code, run tests in one second, see the exact failure, fix. The AI gets a specific error — "expected 1.0, got 1.87" — instead of a vague description of something looking wrong on screen.

The test suite runs 65 tests in about one second. It covers the Elo engine, stat calculations, form ratings, attack/defence ratings, and utility functions. For sessions that touch any of this logic, the tests cut the feedback loop from minutes to seconds. I estimate they save 30–50% of the time and tokens on engine-related sessions.

The bridge is manual, and that is the point

The two environments are connected by the developer, not by automation. I upload source bundles manually. I transfer plans manually. I decide when to ship. This is deliberate.

The manual bridge forces a review checkpoint at every transition. When I upload a new bundle, I am implicitly confirming that the codebase is in a known-good state. When I transfer a plan, I am confirming that I have reviewed and approved it. When I promote to production, I am confirming that staging looked right.

Every automated step in the workflow — building, testing, committing — happens under the AI's control. Every decision step — what to build, whether the plan is right, when to ship — happens under mine. The separation is enforced by the structure of the workflow, not by trusting anyone to remember the rules.

What it looks like in practice

A typical feature cycle:

I upload the latest source bundle to the planning environment and describe what I want. The AI reviews the code, identifies the relevant files, and produces a plan document. I read the plan, make corrections — usually 2–5 specific numbered points. The AI revises. I approve.

I transfer the plan to the implementation environment. The AI reads the plan and works through the tasks: edit file, run build, run tests, commit. If a test fails, it reads the error, identifies the cause, fixes it, and re-runs. When all tasks are complete, it runs the full build for all deployment targets.

I test on a staging server. If it looks right, I promote to production. I generate a new source bundle and upload it, restarting the cycle.

During a recent sprint, this workflow shipped 15 versions in two to three days with no regressions. That is not because the AI is infallible — it is because the plan-review-implement-test cycle catches mistakes before they reach production.

The workflow is not fast for trivial changes. For a one-line fix, the overhead of writing a plan, transferring it, and running the full cycle is overkill. The payoff is on features that touch multiple files, require architectural thought, or need to work correctly across web and Android builds. For those, the plan-first approach is not just faster — it produces better results than I would get working alone.