A practical case study on SOC 2 scope for AI platforms, showing how startups define boundaries around models, data, and support access.
An AI startup may have application infrastructure, model pipelines, vector databases, prompt and response logs, cloud storage, customer workspaces, support tooling, admin consoles, third-party model providers, and internal testing environments.
Somewhere in the middle of all that, the team has to answer one deceptively simple question: what exactly is in scope for SOC 2?
This case study shows how one fictional AI startup worked through that question and built a SOC 2 scope that was practical, defensible, and aligned to how its platform actually operated.
Traditional SaaS scoping is already important. AI platforms add another layer of difficulty because the service often depends on a mix of application layers, data pipelines, inference services, model providers, storage systems, human review steps, and support workflows.
That means scoping cannot stop at “our app and cloud environment.”
This example is fictional, but the scoping challenges are real. Let’s call the company SignalForge AI.
SignalForge provides an AI workflow platform for enterprise teams. Customers use it to upload internal documents, run question-answering and summarization workflows, search across knowledge bases, generate draft content, and route support or operations requests through AI-assisted workflows.
As enterprise deals grew, SignalForge started hearing the same message over and over: “We need SOC 2.”
But before the audit could begin, the company had to answer a harder internal question: what exactly are we certifying?
At first, leadership made a common assumption: “We’ll scope the app, the production cloud environment, and company IT. That should be enough.”
Once readiness conversations began, several problems surfaced right away.
| Problem | Why it mattered |
|---|---|
| The model workflow was not clearly separate from the app | Customer prompts moved through orchestration, retrieval, third-party LLMs, logging, and support flows. |
| Data lived in more places than expected | It was not only in the main app database. It also existed in storage, vector systems, logs, traces, and support workflows. |
| Support access was broader than leadership realized | Support staff could see tenant metadata, logs, issue context, and some debugging outputs tied to customer use. |
The company quickly realized that if it scoped too narrowly, the SOC 2 report would not reflect how the platform actually worked.
SignalForge did not just want a report. It wanted a scope that would stand up to customer scrutiny, reflect real data handling, cover the systems that mattered, avoid unnecessary sprawl, support cleaner evidence collection, and clarify ownership internally.
That meant the scoping exercise had to be practical, not theoretical.
The company stopped asking, “Which tools do we use?”
Instead, it started asking, “Which systems, people, and processes materially affect the security, availability, and confidentiality of the customer-facing service?”
Those three pillars gave the team a much clearer way to decide what belonged in scope.
One of the first questions was whether “the model” itself was in scope. The answer needed more nuance.
SignalForge was not training foundation models from scratch. It used third-party LLM APIs, internal orchestration logic, prompt templates, guardrails, retrieval configuration, model selection rules, and evaluation workflows.
So the real question became: which parts of the model layer materially affect customer data handling and service trust?
This discipline mattered. Without it, the company could have over-scoped every AI-related activity simply because it involved models. Instead, it focused on model-related systems that influenced the live customer service.
This became one of the most important parts of the project. At first, leadership thought of “the database” as the main sensitive store. The real data landscape was much broader.
| Data store | What it held | Why it mattered |
|---|---|---|
| Relational production database | Accounts, tenant records, settings, metadata | Core app operation and customer segregation |
| Object storage | Uploaded customer documents | Source content for retrieval and workflows |
| Vector database | Embeddings and retrieval indexes | Key part of AI search and content access |
| Prompt and output logs | Troubleshooting and service diagnostics | Could contain customer-generated content |
| Monitoring and trace systems | Operational telemetry and debugging details | Could reveal workflow or access details |
| Support systems | Case notes, issue summaries, screenshots | Could contain copied customer context |
The key realization was that customer information did not only live in “the app database.” It also lived in retrieval context, storage layers supporting AI workflows, operational logs, support workflows, and temporary troubleshooting evidence.
That meant any serious SOC 2 scope had to account for the full lifecycle of how customer data moved and where it rested.
This became one of the most valuable decisions in the whole project. At the beginning, support access was treated like a secondary operations issue. In practice, support teams had meaningful access to tenant metadata, logs, account-level settings, troubleshooting context, limited impersonation workflows, screenshots, exports, and escalation paths into engineering.
That made support access directly relevant to customer trust.
This made the company much better prepared for questions like: Can support staff read customer content? Do you log support access? Can support impersonate users? How do you control troubleshooting access? What prevents broad support visibility across tenants?
After working through the environment carefully, SignalForge defined a much more defensible boundary.
Once scope was defined properly, several things became easier.
AI startups often begin SOC 2 scoping like a traditional SaaS exercise and then run into trouble when customer diligence gets more specific. That usually happens when important areas are under-scoped, especially model operations, vector and log storage, support access, and third-party AI provider dependencies.
The strongest AI-platform scopes usually avoid both extremes. They are not so narrow that real trust boundaries are ignored, and they are not so broad that every experiment becomes audit overhead.
Instead, they focus on the systems, data paths, and operational roles that materially affect the live customer service. That is what makes the scope both practical and credible.
For AI startups, SOC 2 scoping should not stop at the application layer. If your service depends on models, multiple data stores, and human support workflows, the scope needs to reflect that reality.
This case study shows that a cleaner scoping approach starts with three questions: which model-related systems materially affect the live service, where does customer data actually live and move, and who can access that data through support, admin, or troubleshooting workflows?
When those questions are answered clearly, the result is a SOC 2 scope that is more useful to auditors, customers, and internal teams alike. Because in the end, a good scope is not just about passing the audit. It is about proving that your control environment matches how your platform really works.