The Questions You Should Be Asking

We’ve spent three articles building to this.

The stranger in your house. The industry that hasn’t solved it. The humans who’ll bypass every control you put in place.

We had GDPR compliance in place. Procedures. We understood data sovereignty, client confidentiality, the basics of keeping things locked down. None of that was new.

However, we watched a CLI tool try to read our SOPS file with helpful determination and realised it could breach every single one of them. If it can do this, it can do that. That breaks this, that breaks that. It just breaks everything, and we had little control over it.

These are the questions that followed.

Where Does Your Data Actually Live?

This was the first thing the SOPS incident made us look at differently.

We knew about GDPR. We knew about data protection. We had safeguards in place to make sure certain data never ended up in certain places. That wasn’t the gap.

The gap was realising that an AI tool doesn’t respect those boundaries. It reads what it can reach. And suddenly data that was never supposed to leave your environment is sitting on someone else’s servers, in someone else’s jurisdiction, governed by someone else’s laws.

There’s a distinction that almost nobody in our industry talks about clearly: the difference between ‘GDPR compliant’ and ‘data sovereign.’ They sound similar. They’re not.

A US company can be GDPR compliant. They sign Standard Contractual Clauses. They implement the technical safeguards. They tick the boxes. AWS has a Frankfurt data centre. Azure has regions across Europe. Your data is physically sitting on a server in Germany.

But the company that operates that server is American. And in 2018, the US passed the Clarifying Lawful Overseas Use of Data Act, the CLOUD Act. It says that US companies must hand over data to US law enforcement on request, regardless of where that data is physically stored.¹ It doesn’t matter that the server is in Frankfurt. It doesn’t matter that you signed a DPA with EU-specific clauses. If the US government issues a warrant or a FISA court order, the provider must comply. And FISA court orders come with gag provisions. They can’t tell you it happened.²

This isn’t theoretical. The Schrems II ruling in 2020 invalidated the EU-US Privacy Shield precisely because of this kind of access.³ The current EU-US Data Privacy Framework, adopted in July 2023, is already facing legal challenges.⁴

When we traced this through properly, it confirmed what we’d suspected: our infrastructure decisions had to change. It’s why we now classify every project by data sensitivity and choose infrastructure accordingly. EU-sovereign providers for client work, with full awareness of the trade-offs when we use US providers for internal projects.

The questions we ask on every project: which company owns the infrastructure? Not which data centre, which company. Is it subject to the CLOUD Act? If a sealed court order was issued for this data, would we ever find out? And critically, did our client consent to their data being subject to US jurisdiction? Does our contract cover it?

These aren’t abstract concerns. When a developer in the UK pastes client code into Claude, that code is on Anthropic’s infrastructure. Anthropic is a US company. The CLOUD Act applies. That’s a jurisdictional decision that deserves to be made consciously, not by accident at 4pm on a Thursday.

What Are Your People Actually Sharing?

This is the question every team skips, because the honest version is harder than the comfortable one.

Not what the policy allows. What’s actually been pasted into AI tools over the past six months. Every debug session. Every ‘help me understand this legacy code.’ Every ‘why isn’t this working’ with a code block attached.

In Part 3, we talked about the 4pm Thursday scenario. The developer who’s stuck, under pressure, and knows that Claude would understand the problem if given enough context. Function signatures lead to calling code. Calling code leads to config. Config leads to test fixtures with customer data someone copied from production two years ago.

Nobody decides to leak data. They decide to paste a function signature. Everything else follows.

We asked ourselves: “Can we audit what left our environments last month? Not the official channels but through browser tabs, personal devices, mobile apps?” For a small team, that’s answerable. For a larger organisation, it’s almost certainly not. And if you can’t audit it, you can’t quantify your exposure.

Contractors compound the problem. They’re using personal devices, personal AI subscriptions, tools you’ve never seen and can’t control. A contractor using a personal GitHub Copilot account to write your production code creates a data flow that’s completely invisible to you. Free tier Copilot may use that code for training.⁵ You’d never know.

How much of your codebase was written by people whose AI usage you’ve never audited? We need to be sure we can answer that. Can you?

Are Your People Safe Enough to Not Need the Browser Tab?

Part 3 made the case that technical controls fail when humans are incentivised to bypass them. But there’s a harder question underneath that, and it’s one we ask ourselves regularly.

Do people feel safe saying ‘I’m stuck’?

Not safe in the HR policy sense. Actually safe. Safe enough that admitting they don’t understand the system doesn’t affect how they’re perceived. Safe enough that asking for help three times in a week doesn’t mark them as underperforming. Safe enough that they’d choose to pair with a colleague over pasting into a personal AI account, even when the AI would be faster.

Because if they don’t feel safe, no amount of tooling will prevent the behaviour. The approved internal AI will sit unused. The policy will be routed around. The shadow AI statistics from Part 2, 90% of companies, 78% bring-your-own-tools, those aren’t about technology gaps. They’re about culture gaps.

We keep our team small partly for this reason. Visibility and trust are security features. But that only works if the culture supports it. We ask ourselves: is our approved tooling good enough that nobody needs to go outside it? If the gap between what’s sanctioned and what’s available for free is too wide, the temptation is constant.

And when something does leak, who’s accountable? The developer who pasted? The manager who created the pressure? The company that didn’t provide adequate tooling? We decided early that accountability sits with the system, not the individual. That changes how people behave.

What Does Your Trust Chain Actually Look Like?

In Part 1, we laid out the trust chain:

Your machine → AI tool → Your network → The provider’s API → Their infrastructure → Their security team → Their employees → Their data retention policies → Their breach response → Every third party they share data with

Every link is a point of failure. For each one, we ask four things: do we have visibility into it, do we have control over it, do we have contractual protection if it fails, and do we have a response plan if it’s breached?

Some links we can answer confidently. Our machine, our network, our sandbox. Those are ours. We built them. We tested them. We know what they do.

Other links, we can’t. When we use a cloud AI provider, even sandboxed, we’re trusting their infrastructure, their staff, their breach response. We’ve shortened the chain where we can. EU-sovereign providers for sensitive work, local models where possible. But the chain still exists. We’ve just made a conscious decision about which links we’re willing to trust and which we’re not.

The trust chain has a jurisdiction dimension too. If your AI provider is a US company, the chain includes the US legal system and its intelligence apparatus. That’s not a conspiracy theory. That’s the CLOUD Act, operating as designed.

If your provider was breached tomorrow, could you quantify your exposure? Or would you be guessing about what was shared, by whom, over how many months?

The Business Risk Nobody Wants to Name

This is the question that shaped our compliance strategy more than any other.

Your business could be damaged by a security incident at a company you’ve never audited, holding data you can’t inventory, governed by policies you haven’t read, in a jurisdiction you didn’t choose.

Not your breach. Theirs.

Your codebase. Your proprietary algorithms. Your client configurations. Sitting on someone else’s servers, subject to someone else’s legal obligations, waiting for their next vulnerability.

How does that conversation go with your clients? ‘We sent your data to a third party who got hacked.’ How does it go with your insurer? Have you disclosed AI tool usage to your insurer? Would a claim be covered, or would they find exclusion language?

What’s your answer when a client asks ‘how do you secure AI in your development process?’ We made sure we had a specific, defensible answer to that question. It drove our classification system, our infrastructure choices, and our decision to pursue ISO 27001 and ISO 42001 certification.

Do you have a confident answer? Or do you hope they don’t ask?

Asking the Right Questions

These questions aren’t comfortable. But they’re the ones that a single afternoon with a CLI tool forced into sharp focus.

We already knew about data protection, jurisdiction, compliance. What the AI tooling did was challenge whether our existing safeguards would survive contact with a technology that shape-shifts around every block. The answer was: not without significant work.

Every infrastructure decision, every classification in our framework, every policy in our compliance documentation exists because one of these questions made us look harder at what we already had in place. Not because we didn’t know. Because we knew exactly what would break if we didn’t act.

Nobody’s got this fully right yet. But you need to know where you stand.

Next in the Series

Part 5: What We Actually Do

We started with hooks and blocklists. We ended up with a classification system, EU-sovereign infrastructure, and a compliance framework built into our tooling.

Not a finished solution. A direction. The trade-offs between capability and privacy, mapped honestly. What we use, what we don’t, and why.

Coming next week.

Sources

US CLOUD Act (Clarifying Lawful Overseas Use of Data Act), 2018. Requires US companies to comply with data requests regardless of data storage location. 18 U.S.C. § 2713. https://www.law.cornell.edu/uscode/text/18/2713
↩ return to article
Foreign Intelligence Surveillance Act (FISA), Section 702. Permits surveillance of non-US persons with gag provisions preventing disclosure to data subjects. https://www.brennancenter.org/our-work/research-reports/foreign-intelligence-surveillance-fisa-section-702-executive-order-12333
↩ return to article
Court of Justice of the European Union, Case C-311/18 (Schrems II), 16 July 2020. Invalidated EU-US Privacy Shield due to US surveillance programmes. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex:62018CJ0311
↩ return to article
EU-US Data Privacy Framework, adopted July 2023. Legal challenge by French MP Philippe Latombe dismissed by General Court September 2025 (Case T-553/23), but appeal to CJEU filed October 2025. https://iapp.org/news/a/european-general-court-dismisses-latombe-challenge-upholds-eu-us-data-privacy-framework
↩ return to article
GitHub Copilot data handling: Business and Enterprise tiers explicitly state code is not used for training. Free tier policies less clear. GitHub Copilot Trust Center. https://github.com/features/copilot
↩ return to article