Modern AI-Driven SOC — Low-Level Architecture (Single Page)

Modern AI‑Driven Security Operations Center (SOC) — Low‑Level Architecture

A single-page, end-to-end logical architecture for a modern SOC designed to handle past, current, and future threats. It shows where LLM/SLM, RAG, and MCP fit; how correlation and detection work; and how response automation stays safe with governance, guardrails, and auditability.

Architecture

Underlying concepts (how this SOC wins)

A modern SOC is a pipeline + a decision system + a safe automation system.

Core objective

High-fidelity decisions at scale

Reduce noise, preserve evidence, explain reasoning, and respond safely. AI improves speed and coverage, but correlation, data quality, and governance are what make it durable.

Key design rule

Separate planes

Data plane (telemetry & storage), Detection plane (correlation & models), AI/Agent plane (LLM/SLM, RAG, MCP), Response plane (SOAR & containment), Governance plane (policy, audit, safety).

Why correlation matters

Attacks are graphs, not alerts

Real threats span identities, endpoints, networks, and cloud. Correlation stitches events into attack stories using rules, state machines, entity graphs, and behavior baselines.

Where LLM/SLM fit

Reasoning + language interface

LLMs summarize, hypothesize, and orchestrate tools; SLMs do high-volume classification and enrichment near real-time. Both must be grounded (RAG) and gated (policy + approvals).

Safety defaults (non‑negotiables)

Least privilege everywhere: tools and agents only get the permissions they need for their role.
Human-in-the-loop for disruptive actions (isolate host, disable account, block egress), with break-glass controls.
Full audit trails: every automated decision includes provenance (data sources, prompts, retrieval, tool calls).
Data minimization: tokenize/ redact sensitive fields before model access; enforce tenant and case boundaries.
Detection content is versioned: rules, playbooks, prompts, and retrieval corpora follow CI/CD and testing.

Key concepts (what to look for)

Two views: logical capabilities vs technical/deployment services, with consistent workflows across both.
Separated planes: telemetry, data, detection/correlation, AI/agents, response, and governance/control.
Workflows are first-class: boxes are the operating model; edges are the contracts (inputs/outputs + responsibility).
Offensive validation is continuous: emulation + bypass testing + regressions produce the scoreboard KPIs.
Data trust is engineered: schema contracts + onboarding certification + quarantine/reprocessing + replay/backfill.
Multi-tenant safety is explicit: tenant/case scoping, per-tenant keys, quotas, and privacy guardrails.
Safe automation is enforced: policy gates + MCP tool gateway + approvals + full audit trails.
AI is productionized: runtime budgets, fallbacks, safe modes, and eval gates (no “LLM as root”).
Platform hardening is a control plane concern: service mesh/workload identity + egress allow-lists + zero-trust admin.
GRC is built-in: controls map to workflows and emit evidence automatically (not manual spreadsheet reconciliation).
Enterprise architecture fit: clear systems-of-record, governed data products, and explicit integration contracts.
UI is an operator surface: filters + era views + search reflect stateful workflows, not a static diagram.

Where to use SLM vs LLM, and what MCP does

Practical routing + MCP tool contract (who calls it, what it returns).

SLM (Small Language Model)

High volume, low latency, structured outputs

Use for large-scale, repetitive work: classification, extraction, scoring, clustering. Keep outputs structured (labels, scores, fields) so correlation and playbooks can use them deterministically.

LLM (Large Language Model)

Reasoning, planning, explanations, narrative

Use for synthesis: explain incident graphs, propose investigation steps, draft reports/comms, translate analyst intent into safe queries, and orchestrate multi-step workflows. Ground with RAG and gate actions via MCP + approvals.

Model routing matrix (typical SOC tasks)

MCP (Model Context Protocol) in this SOC

MCP is the controlled tool gateway between agents/automation and operational systems. Agents don’t call SIEM/EDR/IAM APIs directly; they call MCP tools that enforce policy, validate parameters, require approvals when needed, and emit audit trails.

Threat coverage (detection + mitigation across past, current, future)

This SOC is designed for continuous retro-hunting (past), near-real-time defense (current), and rapid adaptation (future) via detection CI/CD, behavior analytics, and governed agents.

Offensive validation scoreboard

Compact KPIs to prove detection quality and bypass resistance with evidence.

GRC posture (controls ↔ evidence ↔ workflows)

What you can audit: control intent, operating evidence, and ownership tied to SOC workflows.

Enterprise architecture (capabilities ↔ platforms ↔ integration)

How this SOC fits the broader enterprise: SoRs, data products, shared services, and integration contracts.

Completeness review (what’s still missing / needs to be explicit)

A mature SOC needs content lifecycle, safety, and operational resilience in addition to core detection.

The diagrams cover the core pipeline (telemetry → data → correlation → AI/agents → SOAR). The items below are commonly required in a modern SOC and are currently represented only partially (or not explicitly); add them as first-class components to make the design truly complete.