One-shotting a static site generator

This is a writeup of how I built sitegen3, a static site generator for my personal website using a coding agent.

What one-shotting means

My goal was simple: specify the project well enough upfront that I could trigger the agent once and not step in until it was done.

This is what I refer to as one-shotting. The term needs a definition here because it gets used in different ways. I didn't just hand the model a single prompt and watch it produce a finished project. I wrote the specs, worked out a design, and broke it into an implementation plan of discrete tasks (iterating on all of it with the model's help), then ran the agent in a loop with a separate session for each task.

So one-shotting, as I'm using it, doesn't mean a single agent or model execution. It means a single initial trigger, with no human input after that. This is the colloquial sense of the term I've seen in discussions of agentic coding.

Why do this

When I use a coding agent on a non-trivial change, I get the best results when I align on an approach first and only then implement. Claude Code's plan mode is exactly this pattern. I can discuss an approach and iterate until I have a plan I'm comfortable with.

Plan mode works well for small to medium features. For sitegen3 I felt it wasn't going to be enough. The project was big enough that the plan would have grown past what I could comfortably iterate on Claude Code's plan mode, and the implementation would have consumed enough of the context window to degrade the agent's focus and the quality of its output. The method below seems like a good option for such cases: greenfield projects, or large features on existing codebases. For everyday work, something like plan mode is often enough.

The natural next steps:

Specify the project in more detail upfront, across multiple files if it helps, iterating on those documents to refine the requirements.
Divide the project into subtasks the agent can execute one by one, each fitting in a single session.

That's the workflow I set out to build.

The method

I used Claude Code with Claude Opus 4.7. The approach itself is agent and model agnostic.

I didn't want to use anything overly complicated or tied to a specific tool. I had an instinct that I didn't need a multi-agent team or a framework with too many different documents to get this done. So I tried to keep it simple.

At the core of the method is a ralph loop: spawn a fresh agent session on each iteration of a bash loop, and hand each one a specific task. That gave me the shape. What remained was how to specify the project and how to break it into those tasks.

I started with a SPEC.md file capturing the requirements: what the project does, its inputs, outputs, interface, templates, and stack.

From a refined version of that document I developed ARCHITECTURE.md describing the design: the file structure, modules, their responsibilities, contracts, and how they interact.

From those two I derived TASKS.md, breaking the implementation into an ordered list of tasks. Each task was sized to fit a single context window, usually one module plus its tests.

For each of these documents I drafted an initial version with Claude and iterated on it, asking it to be critical and adversarial, to suggest improvements and flag inconsistencies. I explained that these documents would be the basis for an agent building the project, and walked it through the SPEC.md -> ARCHITECTURE.md -> TASKS.md workflow. Claude already knows what spec-driven development, one-shotting, and ralph loops are, so using those terms helps it understand the goal.

The snippets below are condensed for readability; the full files are in the repo of the project.

A representative task entry looked like this:

## Task 3 — Slug normalization

**Goal.** Implement the filename-to-URL-slug pipeline.

**Files to create.** `src/sitegen3/slug.py`, `tests/test_slug.py`.

**Public interface.** `def slugify(name: str) -> str`

**Key rules.** Lowercase, replace spaces with hyphens, strip non-`[a-z0-9-]`, collapse consecutive hyphens, strip leading/trailing hyphens. Example: `"My First Post"` → `"my-first-post"`.

**Tests.** Parametrized cases covering mixed case, multiple spaces, punctuation, non-ASCII, collapsed and edge hyphens, empty input.

**Verification.** The four-command gate (`ruff format`, `ruff check`, `pyright`, `pytest`).

**Done when.** `slugify` handles all parametrized cases, pyright strict passes, tests green.

Finally, I wrote a small bash script ralph_loop.sh: each iteration spawned a fresh agent session with a prompt ralph_prompt.md telling it to pick up the next unfinished task and work on it until done.

The loop itself is simple:

TOTAL=$(grep -c '^## Task ' "$TASKS_FILE")

for ((i = 1; i <= MAX_ITERS; i++)); do
    DONE=$(grep -c '^\*\*Status:\*\* DONE' "$TASKS_FILE")
    [[ "$DONE" -ge "$TOTAL" ]] && exit 0

    claude --print --dangerously-skip-permissions --add-dir . < "$PROMPT_FILE"
done

Here's the prompt each iteration receives:

# sitegen3 — single-task iteration

You are completing one task from `docs/TASKS.md`. A bash loop will reinvoke you with this same prompt until every task is marked done.

## Step 1 — Read SPEC.md, ARCHITECTURE.md, and TASKS.md in full

## Step 2 — Find the first task without a `**Status:** DONE` marker

## Step 3 — Implement exactly what the task specifies

## Step 4 — Run the verification gate (ruff, pyright, pytest)

## Step 5 — Append `**Status:** DONE` to the task only if the gate is green

## Step 6 — Commit the changes with message `task N: <title>`, then stop

## Hard rules

- Read-only docs: never modify `SPEC.md` or `ARCHITECTURE.md`.
- TASKS.md is append-only: only the `**Status:** DONE` line for this task.
- One task per run. No skipping ahead. No partial credit.

Why use a bash loop?

I chose to make a bash loop myself instead of using the Claude Code ralph loop extension for two reasons. First, I wanted a fresh context for each task, but the extension reuses the session. Second, this kept things simple: a small bash script does the job without tying me to a specific tool. I added a few orchestration utilities there too, such as logs tracking how many tasks were done and how many remained at each iteration.

Result

After 14 iterations, one per task, the ralph loop finished and the project was built. Crafting the specification documents took a few days of iteration. The loop itself ran in a few hours (only stopped by the usage limit of my Claude Pro plan). You can see the result at sitegen3 1.0.0, together with the specification documents that were used to generate it.

Everything worked as specified, and the design held up too. I was pleased with the quality of the generated codebase. The agent also wrote 111 passing tests along the way. The site you're reading this on was generated using sitegen3.

It ran cleanly because almost all the work happened before the loop started. I spent days refining the three documents, repeatedly asking the model to flag anything ambiguous or under-specified for an agent, and cutting or rewriting whatever it surfaced. By the time the loop ran, the agent wasn't making design decisions, it was implementing a plan.