You can get a website generated in four minutes for £40 on Fiverr. The seller will run a prompt through one of the no-code AI site builders, swap your logo in, and ship you a Squarespace export. It looks fine. For thirty seconds.

That isn’t what an AI-assisted production build is. The distinction matters because every prospect now opens a sales call with a version of “couldn’t I just get ChatGPT to do this?” The honest answer is no — but ChatGPT is on the team. The point of this piece is to map where AI accelerates production work, and where delegating to it is how you ship a site that breaks quietly in six months.

The named tools, the boundaries, and the reason the boundary exists.

The four tools, mapped to the four phases

Claude (Anthropic) — content first-pass + large refactor sweeps. Claude’s long-context window and editorial register make it the right tool for the two jobs where “produce twenty similar things that all share a structure” is the task. A first pass at the body copy for fourteen treatment service pages. A site-wide migration from one CSS token system to another across 240 files. Both are the kind of work where a human is faster as a reviewer than as a typist, and Claude is faster as a typist than anyone.

Cursor — in-editor pair programming. Cursor sits inside the editor and reads the codebase. The job it does well is “in this component, the prop signature changed — propagate that through the consumers and update the tests.” It earns its £15/month subscription on the day-to-day mechanics of working in a known codebase. It doesn’t make architectural decisions. It executes them.

GitHub Copilot — line completion on boilerplate. Copilot is the autocomplete that’s read every Astro layout file on GitHub. It’s right almost every time on <head> boilerplate, on standard accessibility attributes, on import statements. It’s wrong as often as it’s right on anything that requires reasoning about this specific codebase. The skill is knowing which line you’re on.

Aider — terminal-driven multi-file edits. Aider is the right tool when the change is “this token name appears in nineteen files and three of them need a slightly different replacement.” It commits the changes to git as it goes, so you can git diff the AI’s work like any other contributor’s. It’s the most “production engineering” of the four because it leaves an audit trail.

Four tools, four jobs, no overlap.

Phase 1 — scaffolding

The first phase of every build is scaffolding: routes file, layout component, content schema, base styles, package.json. None of this is creative work. All of it follows a pattern Claude has read ten thousand times.

The honest workflow: I open a new repo, type npm create astro@latest, then ask Claude to scaffold the src/lib/routes.ts, the base layout, the content collection schema, and the tailwind.config.ts. Five minutes of typing becomes thirty seconds of reading and approving. The work is reviewing the scaffold, not producing it.

Where it falls apart: when the scaffold needs to make a choice — Tailwind vs vanilla CSS, MDX vs Markdown, content collections vs filesystem — Claude will guess based on its training data, which is months out of date. The engineer picks the architecture; the AI fills in the boilerplate that follows from the pick.

Phase 2 — style sweeps and refactors

The phase where AI saves the most hours. A real example: in mid-2026 I migrated UK Web Marketing’s design tokens from a TypeScript object to CSS custom properties, then re-ran every component to use the new variable names. 240 files. About 1,800 token references.

Doing it by hand: two days of grep -r, manual edits, repeated test runs. Doing it with Aider: one carefully written prompt, ninety minutes of supervised execution, a single PR with a tidy git history. The diff was readable. Every change was committed atomically. The tests passed.

The discipline is bounded delegation. The prompt tells Aider exactly what to change, what to leave alone, and what to flag. It doesn’t say “modernise the codebase.” It says “rename tokens.colors.brand to var(--color-brand) across src/components, leave src/pages for me.” The boundary is where the human stays in charge.

Phase 3 — content first-pass

Content is the phase prospects most often think AI should be doing alone — and the phase where it’s most obviously a tool, not a writer.

Claude’s first pass at a 1,200-word treatment-service body is usable. It hits the structure, it includes the right headings, it covers the topics a clinic patient would ask about. What it doesn’t have is the practice’s voice. It doesn’t know that this specific Leeds clinic refers to “consultations” rather than “appointments,” that the senior practitioner doesn’t allow the word “results” without an evidence link, that the tone is warmer than NHS but cooler than spa.

So the workflow is: Claude drafts, the engineer (or content lead, or the client themselves) rewrites the first paragraph in voice, then runs a “match this voice across the rest of the document” pass. Final pass is a human read for medical/legal accuracy. Three rounds, total time roughly 25% of writing from scratch, output quality higher than either AI-only or human-only.

The boundary is brand voice and factual accuracy stay with humans. Every page that goes live has been read end to end by someone who knows the business. Not because Claude can’t write — because the cost of a single hallucinated fact on a clinic site is bigger than the cost of an editor.

Phase 4 — QA checklists

The phase where AI is most reliably useful and least dangerous. Pre-launch sweeps are pattern-matching tasks: does every page have a unique <title>? Does every form have a CSRF token? Is every external link rel="noopener"? Are colour contrasts hitting WCAG 2.2 AAA on every text-on-background pair?

Claude can be given the codebase and asked to enumerate the failures. It’s right almost every time, because the questions have known correct answers. The engineer’s job is reading the list and fixing the genuine problems while ignoring the ones that are intentional. (“Yes, this <a target="_blank"> doesn’t have rel="noopener" — it’s an internal preview-link in dev only.”)

That’s the rhythm: AI proposes, engineer disposes. Faster than running the checklist by hand. Safer than running the checklist via AI alone.

The four jobs AI doesn’t get

These are the jobs where AI assistance fails quietly — meaning the output looks right at the time, and the failure surfaces months later as a security incident, a regulatory finding, or a fundamental rebuild.

Schema design. The shape of the data model is the shape of the business for the next five years. Get it wrong and every feature built on top inherits the wrongness. AI can suggest a schema for a “clinic website” — what it can’t do is know that this clinic charges per consultation not per service, that practitioners work across multiple sites and need many-to-many relationships, that the GDPR data-minimisation review will reject a nationality field. The schema is an engineering decision rooted in the business, not the codebase.

Authentication and payment webhooks. Security primitives are where “almost right” is wrong. A Stripe webhook handler that forgets to verify the signature header looks fine in code review — it works on every legitimate request. It also works on every forged request. AI is exceptionally prone to writing webhook handlers that miss the signature check, because the unverified version is the version that appears in tutorials. Auth flows, session handling, CSRF tokens, payment webhooks — all stay with engineers who’ve shipped them before.

Data flow / control flow architecture. When a booking is created, what writes where, in what order, and what happens if step three fails halfway through? This is the question every multi-step transaction has to answer, and the answer determines whether the business loses money quietly when things go wrong. AI cannot reason about your specific business’s failure modes. It can suggest patterns. The architect picks the one that matches the actual blast-radius cost.

Brand voice. The thing that makes a site sound like this clinic, this firm, this practice, not “a clinic, a firm, a practice.” Brand voice is the thing AI is structurally worst at because its training objective is average likely-next-token, which is the literal opposite of voice. The engineer or the editor owns this. Always.

The Fiverr question, answered

When the prospect asks “couldn’t I just get ChatGPT to build this,” the honest answer is the matrix above. AI is on the team — for scaffolding, for refactor sweeps, for content drafts, for QA — and the engineer is also on the team, for schema, auth, payments, voice.

The Fiverr deliverable skips the second half. That’s why an AI-generated site looks fine for thirty seconds and falls apart the first time you need a non-template page, a custom integration, a redirect that holds rank, or a content shape the underlying template doesn’t support.

A managed website service is the matrix run on purpose, every week, by people who know which half of the work AI does and which half it doesn’t.

If you want the AI-assisted version — managed website service or audit the AI-generated site you’ve already got.

More on the system the AI fits inside: Information architecture before design and Pages vs systems.

AI-assisted vs AI-generated: where Claude, Cursor and Copilot fit

The four tools, mapped to the four phases

Phase 1 — scaffolding

Phase 2 — style sweeps and refactors

Phase 3 — content first-pass

Phase 4 — QA checklists

The four jobs AI doesn’t get

The Fiverr question, answered

Ready for the website + infrastructure your business should already have?

AI-assisted vs AI-generated: where Claude, Cursor and Copilot fit

The four tools, mapped to the four phases

Phase 1 — scaffolding

Phase 2 — style sweeps and refactors

Phase 3 — content first-pass

Phase 4 — QA checklists

The four jobs AI doesn’t get

The Fiverr question, answered

Keep reading

Ready for the website + infrastructure your business should already have?