The case study is the most labour-intensive format in agency marketing: 8–20 hours of team time per published piece, with sources scattered across email, calls, notes, and the client's own documents. AI speeds up three of the seven stages by 5–10x and leaves four untouched — the ones that require human judgement. This is the full pipeline we run inside our Case Study Generator, with the concrete n8n nodes and the exact handoff point to the editor.
The seven stages of the production pipeline
A good agency case study is built from seven sequential steps. AI accelerates three of them radically, two noticeably, and two not at all.
| # | Stage | What it does | Manual time | Time with AI |
|---|---|---|---|---|
| 1 | Source gathering | Parsing emails, calls, notes, client documents | 2–4 h | 15–30 min |
| 2 | Fact extraction | Numbers, names, dates, key quotes, result metrics | 1–2 h | 10–15 min |
| 3 | Structure agreement | What goes into the case, what gets cut | 1 h | 1 h (no AI) |
| 4 | Storytelling draft | Framing the narrative — problem, approach, results | 2–4 h | 20–30 min |
| 5 | Fact-checking | Reconciling numbers and names against the sources | 0.5–1 h | 5–10 min |
| 6 | Author's edit | Style, rhythm, point of view, quotes | 2–3 h | 2–3 h (no AI) |
| 7 | Client sign-off | Quotes, sensitive numbers, product references | 0.5–2 h | 0.5–2 h (no AI) |
Total: 9–17 hours manually → 4–7 hours with AI. The saving isn't "10x," the way the marketing decks promise — it's "2–3x," but concentrated on the heavy-lifting gathering and drafting stages. Stages 3, 6, and 7 are where human judgement lives, and AI doesn't help there.
Stage 1. Source gathering
The most underrated stage. In a typical agency, a single case study is assembled from 8–15 sources: email threads with the client, transcripts of calls, PM notes in Notion, client materials (reports, charts, screenshots), and press mentions.
What n8n does:
- Parses the email thread for the client's topic (Gmail/Outlook API)
- Transcribes calls (Whisper API, run locally)
- Pulls Notion/Confluence/Google Docs pages by client tag
- Collects everything into a single markdown document with dividers and timestamps
The result is one 30–50-page file of structured context, ready to feed to an LLM. Previously a PM had to physically open 8–15 different interfaces and copy the relevant bits by hand. Now it's one Slack command or one webhook.
Where it breaks. Transcript quality depends on call audio quality. The email parser only catches what was in text form — attachments and embedded tables need separate handling. Notion/Confluence APIs have request limits. All of these are solvable through configuration, but they need initial setup during client onboarding.
Stage 2. Fact extraction
Once the context is gathered, you need to pull the structured filling out of it: which numbers were mentioned, which names, which dates, which quotes. This is a task LLMs are strong at.
What the LLM does:
- Receives the 30–50-page markdown from stage 1
- Returns structured JSON: names and titles tied to their source, numbers and metrics (with context — over what period, in what units), dates of key events, quotes attributed to the speaker
The prompt for this stage runs around 800–1,200 tokens in our system and gets rewritten per client — everyone has their own view of what counts as a "key number" (for one it's revenue growth, for another engagement metrics, for a third media impressions).
The LLM here works not as a generator but as a structuring layer. Extraction accuracy on a typical source runs around 90–95%; the remaining 5–10% gets corrected by the editor in stage 5.
Stage 3. Structure agreement
This is the first stage where AI doesn't help. The decision of "what angle do we take" is a function of the client's positioning in their market, of our relationship with the client, and of which moments can be made public and which can't.
What the human does:
- Looks at the JSON of facts from stage 2
- Picks 3–5 key moments out of 20–30 candidates
- Decides what to feature, what to mention in passing, what to cut entirely
- Agrees the main beats with the client before any draft gets written
This is the work of a PM or senior copywriter, 30–60 minutes. Without it, the AI-generated case study comes out "technically correct" but fails to answer the question "why are we publishing this."
Many people try to delegate this stage to AI ("let the LLM decide what matters"). The result is standardised case studies that are indistinguishable from a competitor's. Structure has to be a deliberate choice, not a statistical average.
Stage 4. Storytelling draft
Agreed structure (stage 3) + JSON of facts (stage 2) → the LLM generates the first draft. This is the second stage where AI delivers the bulk of the time saving.
What the LLM does:
- Receives the "problem → approach → actions → result" skeleton and the list of facts
- Writes a coherent 1,500–3,000-word narrative
- Holds the agency's house style (via the system prompt)
- Places quotes in the right spots
- Runs through humanizing passes: strips templated phrasing, drops in concrete numbers, breaks up rule-of-three patterns
The result is a draft that reads like living text but still needs an author's hand in stage 6. Quality is higher than "a technical retelling of the facts" but lower than "ready to publish."
Stage 5. Fact-checking
Every LLM hallucinates. That's not a bug, it's a property of probabilistic generation. On production material with verifiable numbers this is critical — a client won't forgive "grew 23%" when it was "grew 19%."
What n8n + LLM do:
- Receive the final draft from stage 4
- Extract every number, name, and date from it
- Reconcile them against the JSON of facts from stage 2
- Return a list of discrepancies with the location in the text
This is automated fact-checking against a single source (our JSON). It catches the model's hallucinations but not errors in the original source. If someone misspoke a figure on the recorded call, this pipeline won't catch it.
The defence against that is a second layer of fact-checking in stage 7, when the client reviews the final text.
Stage 6. Author's edit
The third stage without AI. The editor takes the clean, verified draft and:
- Sets the authorial tone — room for a touch of irony, gravity, an emotional beat
- Adds an opening hook and a closing that lands on one idea
- Reorders arguments to follow a narrative logic the LLM doesn't feel
- Threads in internal links to other agency material
- Finalises the CTA
This is senior-level work, 1–2 hours on a typical case. You can't save it with AI — because the more material you run through humanizer-LLM-humanizer, the further it drifts from a living voice.
Stage 7. Client sign-off
The final stage without AI. The client checks:
- Employee quotes — who said what, and consent
- Numbers and metrics — what can be made public
- References to internal processes — what shouldn't be disclosed
- Mentions of partners and subcontractors — whether those need their own approval
This stage is a function of the client relationship, not the technology. Good relationship — the client signs off in 1–2 days. Bad one — 2–3 weeks. AI doesn't shorten either scenario.
What we removed and added after a year of running it
The current version of the pipeline is the third. What changed:
We removed the "generate the structure" LLM stage. Earlier there was an automatic structure suggester after stage 2. An experiment across 30 case studies showed the structure came out too similar between clients — uniqueness got lost. We handed stage 3 back to a human and quality went up.
We added a humanizing pass. Originally the draft from stage 4 went straight to the editor, who spent 50% of their time removing AI templates. We added a deterministic humanizer step — the editor now spends time on substance, not on cleanup.
We added a fact-extraction step. Originally the draft was written directly from the raw context of stage 1. Number accuracy was unstable. Once we broke stage 2 out into a separate step with structured JSON output, the fact check (stage 5) became trivial.
We removed the multi-LLM check. An experiment — run the draft through two different models and compare. More noise than signal; we cut it.
When you don't need an AI pipeline
Not every agency needs this level of automation. A clear list of cases where the AI pipeline is overkill:
- Fewer than one case study a month. The setup cost won't pay back. Do it by hand.
- Case studies are short 500–700-word previews. They already get written in 2–3 hours. AI saves a little, adds infrastructure.
- A team of 1–2 people. An AI pipeline needs someone to maintain it. Teams of two or fewer drown in the infrastructure overhead.
- Every client is unique in style and process. If 5 clients mean 5 different prompt configs and integrations, a shared pipeline becomes more expensive than 5 manual processes.
The sweet spot for an AI pipeline is an agency of 10–50 people, 4–10 case studies a month, with a shared editorial standard. That's the level where saving 5–7 hours per case multiplies across dozens of cases and pays back the infrastructure.
Build the pipeline once, then just use it
The Case Study Generator is our reference implementation of all seven stages. Implementation for a client takes 3–5 weeks, including team onboarding, tuning to the agency's house style, and integrations with existing tools.
If your agency already has a "manual process baseline" (case studies get written, but at a cost) — let's talk. We'll walk the seven stages above for your team and work out which of them you can realistically automate in the first three months, and which are better left manual.
