ADR 003: gVisor RuntimeClass for Agent Sandboxes
Author: jomcgi Status: Accepted Created: 2026-05-22
Problem
Agent sandbox pods execute arbitrary code on behalf of LLM-driven agents — Goose recipes, Claude harnesses, and (per agents/014) future AX actors. The current defense-in-depth stack — Cloudflare → Linkerd mTLS → Kyverno admission → non-root + dropped capabilities + seccompProfile: RuntimeDefault (docs/security.md) — does not isolate the host kernel. Every sandbox shares the K3s node's Linux kernel with every other workload, including the monolith, vLLM, knowledge graph, ArgoCD, Linkerd, and SigNoz.
A single kernel CVE reached through a compromised dependency, a poisoned MCP tool response, or a prompt-injection payload that smuggles a syscall sequence is enough to break out of the container and root the node — and from there, the entire cluster. The blast radius does not match the trust level of the workload: untrusted, agent-driven code shares a kernel with the most trusted services in the homelab.
Two pressures sharpen this:
- Routine-job agents in the monolith (
monolith-agent-*MCP surface) dispatch agents from cron-style triggers without a human in the loop. The window between "MCP tool returns a malicious payload" and "kernel exploit lands" shrinks to seconds with no checkpoint. - The AX/Substrate adoption brings Substrate's
ateom-gvisorinterior helper — it is built on the assumption that agent actors run under gVisor.
We need a second kernel boundary specifically for code paths that execute untrusted input.
Proposal
Adopt Google's gVisor (runsc) as a Kubernetes RuntimeClass on the K3s cluster, install runsc on a designated subset of agent-worker nodes, and opt sandbox pods into it via spec.runtimeClassName: gvisor. The default workload runtime stays runc for everything trusted; gVisor is additive, not a replacement.
runsc is a user-space kernel written in Go that re-implements ~200 Linux syscalls in userland and proxies (or denies) the rest. A container running under runsc cannot directly invoke a host kernel syscall — it hits gVisor first, which decides whether to handle it in-userland, forward it through a strictly mediated path, or fail it. The host kernel attack surface visible to the workload shrinks from ~400 syscalls (RuntimeDefault seccomp) to ~50 host syscalls that gVisor itself uses.
| Aspect | Today (runc only) | Proposed (runc + gvisor) |
|---|---|---|
| Sandbox kernel | Host Linux kernel (full surface) | gVisor user-space kernel (~200 calls) |
| Container escape blast | Node root → cluster pivot | gVisor process root → still sandboxed |
| Syscall filter | seccomp RuntimeDefault on host kernel | seccomp + userland reimplementation |
| Workload opt-in | Implicit (everyone shares runc) | Explicit (runtimeClassName: gvisor) |
| Compatible workloads | All | Most; GPU + some kernel-tight code excluded |
| Perf overhead | None | ~5–15% syscall, ~10–30% network |
The agent-sandbox CRD set already exposes runtimeClassName in the pod spec (projects/agent_platform/chart/agent-sandbox/crds/crds.yaml), so the opt-in surface for sandboxes is a values-file change, not a chart change.
Architecture
graph TB
subgraph "K3s Cluster"
subgraph "Control / Trusted Nodes (runc only)"
Mono[Monolith]
Argo[ArgoCD]
Linkerd[Linkerd Control Plane]
VLLM[vLLM + GPU]
SigNoz[SigNoz]
end
subgraph "Agent Worker Nodes (runc + runsc)"
direction TB
RunC[runc Runtime]
RunSC[runsc Runtime - gVisor]
subgraph "Untrusted Sandboxes (runtimeClassName: gvisor)"
Goose[Goose Recipe Pods]
Claude[Claude Harness Pods]
AX[AX Actor Pods]
end
RunSC -.-> Goose
RunSC -.-> Claude
RunSC -.-> AX
end
end
K8s[Kubernetes API] -->|RuntimeClass=gvisor| RunSC
K8s -->|default RuntimeClass=runc| RunC
style RunSC fill:#326CE5,color:#fff
style Goose fill:#F7B93E,color:#000
style Claude fill:#F7B93E,color:#000
style AX fill:#F7B93E,color:#000Why K3s makes this cheap
projects/platform/nvidia-gpu-operator/values.yaml already demonstrates that K3s containerd is template-customizable at /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl. Adding gVisor is a runsc binary on the node, a containerd runtime stanza, and a RuntimeClass CR — no kernel modules, no distro extension build, no host modification beyond what the GPU operator already does.
What runs where, and why
| Workload | Node class | Runtime | Reason |
|---|---|---|---|
| Monolith, ArgoCD, Linkerd, SigNoz, etc. | trusted | runc | Operator-controlled, no untrusted input execution; perf + privileged needs |
| vLLM, inference, embedding | trusted (GPU) | runc | Needs /dev/nvidia* mmap — gVisor cannot proxy GPU device files |
| Goose recipe sandboxes | agent-worker | runsc | Executes code from LLM-generated outputs |
| Claude agent harness pods | agent-worker | runsc | Same |
| AX actor pods | agent-worker | runsc | Substrate's ateom-gvisor assumes gVisor; required by ADR 014 |
| MCP servers (Context Forge backends) | trusted | runc | Trusted code path; tool outputs are untrusted, execution is not |
The agent-worker node class is enforced via a node-role: agent-worker label, an agent-only=true:NoSchedule taint, and a Kyverno policy requiring runtimeClassName: gvisor for any pod scheduling there.
Security
This ADR adds a layer to the model in docs/security.md; it does not change anything else. After implementation, Layer 4 (Runtime Security) gains:
Layer 4: Runtime Security (existing)
- readOnlyRootFilesystem, runAsNonRoot, capabilities.drop, seccompProfile
+ RuntimeClass: gvisor ← NEW for agent-worker workloadsWhat gVisor adds:
- Host kernel CVE protection. A kernel exploit that would compromise the node via a sandbox now compromises the gVisor process, which is itself a non-root, dropped-capabilities, seccomp-filtered userspace process. The attacker must chain a gVisor escape with a kernel CVE.
- Tighter syscall mediation. Where seccomp denies a syscall, gVisor reimplements it in userland. Same denial of host kernel reach, fewer compatibility breaks.
- No new secret surface. gVisor introduces no new auth boundaries or credentials.
What gVisor does not protect:
- Network-level pivots (covered by Linkerd authz policies)
- LLM prompt injection using legitimate tool calls maliciously (covered by RBAC on MCP, ADR 005)
- Filesystem persistence attacks (covered by ephemeral PVCs in SandboxTemplate)
- Resource exhaustion (covered by ResourceQuota on the namespace)
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Goose recipes fail under runsc (filesystem, /proc, network quirks) | Medium | Medium | Patch the recipe; if genuinely incompatible, allowlist with explicit justification in docs/security.md |
| Network throughput drop hurts MCP latency | Medium | Low | gVisor netstack is actively improving; per-pod opt-out with --network=host available but must be justified |
| gVisor escape CVE (Google does patch these — see GHSA history) | Low | High | Renovate-tracked upgrades, gvisor-security mailing list subscription; combined with seccomp so an escape still hits RuntimeDefault |
| K3s containerd config drift across nodes | Medium | Medium | Bake the config template into K3s install scaffolding (precedent: nvidia-gpu-operator) |
| Operational complexity of two runtimes | Low | Low | Default stays runc; agent-worker taint + Kyverno admission make the boundary explicit |
| GPU-using agent workloads cannot use gvisor | High | Low | Documented exclusion; inference + image gen stay on runc (they're trusted code paths with untrusted inputs — the model can't escape) |
Open Questions
- Should
agent-workerbe a separate physical node or a labeled subset? Physical separation is the strongest isolation; labels save hardware but lean on scheduler discipline under pressure. --platform=systrap(default, most compatible) or--platform=kvm(faster, needs nested virt + CPU VMX)? Measure on the actual hardware.- Where do
goose_agentimage builds run?apkobuild steps are syscall-heavy; build pods may stay on runc while execution of the built image runs on runsc.
References
| Resource | Relevance |
|---|---|
| gVisor architecture | How runsc reimplements syscalls in userspace |
| gVisor + K3s walkthrough | K3s containerd template extension pattern |
| Kubernetes RuntimeClass | The opt-in mechanism |
| gVisor compatibility docs | Known incompatibilities to check Goose recipes against |
| docs/security.md | Defense-in-depth model this ADR extends |
| agents/014: AX + Substrate | Depends on this ADR; Substrate's ateom-gvisor assumes runsc is live |
| projects/agent_platform/chart/agent-sandbox/crds/crds.yaml | runtimeClassName field already present in CRD schema |
| projects/platform/nvidia-gpu-operator/values.yaml | Precedent for K3s containerd template customization |