TL;DR: You can use LLMs on client work under GDPR — but not by pasting whatever's in front of you into a chat box. The rules: know what's personal data and what's special-category, have a lawful basis, sign a DPA with the vendor, check where the data lives and who the sub-processors are, and pseudonymize before the data leaves your control whenever you can. Self-host only when the data genuinely demands it. Most of this is process, not lawyering.
Every agency owner I talk to is somewhere on the same spectrum. One end: "We banned AI tools because of GDPR." The other: "We paste client briefs into ChatGPT all day, it's fine." Both are wrong, and both are expensive — the first leaves productivity on the table, the second leaves you exposed.
The truth is in the middle and it's manageable. I'm not a lawyer and this isn't legal advice; it's the operating model I'd hand a new account director on day one so they don't do anything that lands the agency in front of a supervisory authority.
What actually counts as personal data
GDPR governs personal data — any information relating to an identified or identifiable living person. That's broader than people expect. A name, obviously. But also an email address, a phone number, an IP address, a job title tied to a named company, or a quote attributed to a specific spokesperson. If you can single out a human from it, directly or by combining it with something else, it's personal data.
Then there's a stricter tier: special-category data. Health, ethnicity, religion, political opinions, sexual orientation, biometric and genetic data, trade-union membership. Processing this is prohibited by default and only allowed under narrow conditions. For an agency, this matters more than you'd think — a healthcare client's patient testimonial, a campaign about a religious holiday with named participants, an advocacy brief revealing political affiliation.
The test before anything goes near a model: Does this text identify a real person? Does it reveal anything special-category about them? If yes to either, you handle it deliberately. If no, your obligations are much lighter.
You need a lawful basis
You can't process personal data just because it's convenient. GDPR gives you six lawful bases; for agency work, two come up most:
- Legitimate interest — you have a genuine business reason, it's proportionate, and it doesn't override the individual's rights. This covers a lot of routine processing, but you have to be able to justify it and document the balancing test.
- Consent — freely given, specific, informed, unambiguous. Harder to rely on than people think, and revocable at any time.
The point isn't to pick the prettiest word. It's that before you feed a client's customer list into a model to draft segmented messaging, you should be able to say which basis applies and why. If you can't, don't process it.
The DPA question with AI vendors
When you send personal data to an AI vendor, that vendor becomes a processor acting on your instructions, and GDPR requires a Data Processing Agreement (DPA) between you. No DPA, no lawful transfer. Full stop.
The good news: every serious AI vendor now publishes one. Before you put client data through a tool, confirm:
- A DPA exists and you've accepted it. Many are click-through in the account settings; some require signature on business tiers.
- The DPA covers the specific product you're using, not just the company's flagship.
- Training carve-out. Confirm in writing that your inputs and outputs are not used to train the vendor's models. Consumer tiers often reserve this right; business and enterprise tiers usually don't, but you check.
The free or personal tier of a consumer AI tool and its enterprise tier are, for compliance purposes, different products. The personal tier frequently has no DPA and may use your inputs for training. Never run client personal data through a personal-tier account. This is the single most common mistake I see.
Data residency and sub-processors
Two questions that decide a lot:
Where does the data physically live? Transfers of personal data outside the EEA are restricted unless there's an adequacy decision or appropriate safeguards (Standard Contractual Clauses, mostly). Many vendors now offer EU data residency — processing and storage within the EEA — and for EEA-client work that removes a whole category of headache. Ask, and get it in writing.
Who are the sub-processors? Your vendor almost certainly relies on others — a cloud host, sometimes a model provider behind the API. GDPR makes you responsible for that chain. Reputable vendors publish a sub-processor list and notify you of changes. Read it. If a sub-processor sits somewhere that breaks your residency promise to a client, you need to know before the client does.
Self-host vs managed API
The instinct after reading all this is "let's just self-host a model and avoid the whole problem." Sometimes that's right. Usually it's overkill.
Use a managed API (with a proper DPA and EU residency) when: you're processing ordinary business personal data, you've pseudonymized where practical, and the vendor's contractual and residency posture satisfies your clients. This is the right answer for the large majority of agency work — you get current models without running infrastructure.
Self-host an open-weight model when: you're regularly handling special-category data, a client contract explicitly forbids third-party processing, or you operate in a regulated sector where the data genuinely cannot leave your boundary. Self-hosting moves the compliance burden onto you — you're now responsible for the security, access control, and logging the vendor used to handle. It's a real commitment, not a free pass.
The honest framing: self-hosting is a data-sovereignty decision, not a default. Choose it because the data demands it, not because it sounds safer.
Pseudonymize before it leaves your control
The most useful habit in this entire piece: strip the identifiers before the text goes to the model. GDPR explicitly favors pseudonymization and anonymization as risk-reducing measures.
In practice, most agency tasks don't need real names to work. The model can draft a segmented email, polish a testimonial, or restructure a brief just as well when "Jane Doe, CMO at Acme, jane@acme.com" becomes "[CLIENT_CONTACT], [TITLE] at [COMPANY]." You map the placeholders back in afterward, locally.
- Anonymization (irreversible — the person can never be re-identified) takes the data out of GDPR scope entirely. It's the gold standard and the hardest to achieve genuinely.
- Pseudonymization (reversible with a separately held key) keeps you in scope but materially lowers the risk. It's achievable on almost every task and you should make it the default.
The practical do / don't list
Do:
- Sign and verify the DPA before the first byte of client data goes through a tool.
- Pseudonymize: replace names, emails, and direct identifiers with placeholders before sending.
- Confirm EU/EEA residency for EEA-client data and get it in writing.
- Keep a short internal register: which tools, which data types, which lawful basis.
- Use business or enterprise tiers with training carve-outs, never personal tiers, for client work.
Don't:
- Paste client customer lists, contact databases, or CRM exports into a consumer chat box.
- Send special-category data to a managed API without a specific legal review first.
- Assume "the vendor is big, so it's compliant" — your obligations are yours, not theirs.
- Rely on consent you can't prove was freely given and informed.
- Forget the outputs: an AI-generated draft naming real people is also personal data and lives under the same rules.
The bottom line
GDPR isn't a reason to avoid AI. It's a reason to be deliberate about three things: what you send, who you send it to, and whether the person on the other side could be identified. Get a DPA, pseudonymize by default, check residency and sub-processors, and reserve self-hosting for the data that truly needs it.
Do that and you can run a modern, AI-assisted agency without ever losing sleep over a client's data — or a regulator's letter.
