Skip to content

ADR 003: gVisor RuntimeClass for Agent Sandboxes

Author: jomcgi Status: Accepted Created: 2026-05-22


Problem

Agent sandbox pods execute arbitrary code on behalf of LLM-driven agents — Goose recipes, Claude harnesses, and (per agents/014) future AX actors. The current defense-in-depth stack — Cloudflare → Linkerd mTLS → Kyverno admission → non-root + dropped capabilities + seccompProfile: RuntimeDefault (docs/security.md) — does not isolate the host kernel. Every sandbox shares the K3s node's Linux kernel with every other workload, including the monolith, vLLM, knowledge graph, ArgoCD, Linkerd, and SigNoz.

A single kernel CVE reached through a compromised dependency, a poisoned MCP tool response, or a prompt-injection payload that smuggles a syscall sequence is enough to break out of the container and root the node — and from there, the entire cluster. The blast radius does not match the trust level of the workload: untrusted, agent-driven code shares a kernel with the most trusted services in the homelab.

Two pressures sharpen this:

  1. Routine-job agents in the monolith (monolith-agent-* MCP surface) dispatch agents from cron-style triggers without a human in the loop. The window between "MCP tool returns a malicious payload" and "kernel exploit lands" shrinks to seconds with no checkpoint.
  2. The AX/Substrate adoption brings Substrate's ateom-gvisor interior helper — it is built on the assumption that agent actors run under gVisor.

We need a second kernel boundary specifically for code paths that execute untrusted input.


Proposal

Adopt Google's gVisor (runsc) as a Kubernetes RuntimeClass on the K3s cluster, install runsc on a designated subset of agent-worker nodes, and opt sandbox pods into it via spec.runtimeClassName: gvisor. The default workload runtime stays runc for everything trusted; gVisor is additive, not a replacement.

runsc is a user-space kernel written in Go that re-implements ~200 Linux syscalls in userland and proxies (or denies) the rest. A container running under runsc cannot directly invoke a host kernel syscall — it hits gVisor first, which decides whether to handle it in-userland, forward it through a strictly mediated path, or fail it. The host kernel attack surface visible to the workload shrinks from ~400 syscalls (RuntimeDefault seccomp) to ~50 host syscalls that gVisor itself uses.

AspectToday (runc only)Proposed (runc + gvisor)
Sandbox kernelHost Linux kernel (full surface)gVisor user-space kernel (~200 calls)
Container escape blastNode root → cluster pivotgVisor process root → still sandboxed
Syscall filterseccomp RuntimeDefault on host kernelseccomp + userland reimplementation
Workload opt-inImplicit (everyone shares runc)Explicit (runtimeClassName: gvisor)
Compatible workloadsAllMost; GPU + some kernel-tight code excluded
Perf overheadNone~5–15% syscall, ~10–30% network

The agent-sandbox CRD set already exposes runtimeClassName in the pod spec (projects/agent_platform/chart/agent-sandbox/crds/crds.yaml), so the opt-in surface for sandboxes is a values-file change, not a chart change.


Architecture

mermaid
graph TB
    subgraph "K3s Cluster"
        subgraph "Control / Trusted Nodes (runc only)"
            Mono[Monolith]
            Argo[ArgoCD]
            Linkerd[Linkerd Control Plane]
            VLLM[vLLM + GPU]
            SigNoz[SigNoz]
        end

        subgraph "Agent Worker Nodes (runc + runsc)"
            direction TB
            RunC[runc Runtime]
            RunSC[runsc Runtime - gVisor]
            subgraph "Untrusted Sandboxes (runtimeClassName: gvisor)"
                Goose[Goose Recipe Pods]
                Claude[Claude Harness Pods]
                AX[AX Actor Pods]
            end
            RunSC -.-> Goose
            RunSC -.-> Claude
            RunSC -.-> AX
        end
    end

    K8s[Kubernetes API] -->|RuntimeClass=gvisor| RunSC
    K8s -->|default RuntimeClass=runc| RunC

    style RunSC fill:#326CE5,color:#fff
    style Goose fill:#F7B93E,color:#000
    style Claude fill:#F7B93E,color:#000
    style AX fill:#F7B93E,color:#000

Why K3s makes this cheap

projects/platform/nvidia-gpu-operator/values.yaml already demonstrates that K3s containerd is template-customizable at /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl. Adding gVisor is a runsc binary on the node, a containerd runtime stanza, and a RuntimeClass CR — no kernel modules, no distro extension build, no host modification beyond what the GPU operator already does.

What runs where, and why

WorkloadNode classRuntimeReason
Monolith, ArgoCD, Linkerd, SigNoz, etc.trustedruncOperator-controlled, no untrusted input execution; perf + privileged needs
vLLM, inference, embeddingtrusted (GPU)runcNeeds /dev/nvidia* mmap — gVisor cannot proxy GPU device files
Goose recipe sandboxesagent-workerrunscExecutes code from LLM-generated outputs
Claude agent harness podsagent-workerrunscSame
AX actor podsagent-workerrunscSubstrate's ateom-gvisor assumes gVisor; required by ADR 014
MCP servers (Context Forge backends)trustedruncTrusted code path; tool outputs are untrusted, execution is not

The agent-worker node class is enforced via a node-role: agent-worker label, an agent-only=true:NoSchedule taint, and a Kyverno policy requiring runtimeClassName: gvisor for any pod scheduling there.


Security

This ADR adds a layer to the model in docs/security.md; it does not change anything else. After implementation, Layer 4 (Runtime Security) gains:

Layer 4: Runtime Security (existing)
  - readOnlyRootFilesystem, runAsNonRoot, capabilities.drop, seccompProfile
  + RuntimeClass: gvisor    ← NEW for agent-worker workloads

What gVisor adds:

  • Host kernel CVE protection. A kernel exploit that would compromise the node via a sandbox now compromises the gVisor process, which is itself a non-root, dropped-capabilities, seccomp-filtered userspace process. The attacker must chain a gVisor escape with a kernel CVE.
  • Tighter syscall mediation. Where seccomp denies a syscall, gVisor reimplements it in userland. Same denial of host kernel reach, fewer compatibility breaks.
  • No new secret surface. gVisor introduces no new auth boundaries or credentials.

What gVisor does not protect:

  • Network-level pivots (covered by Linkerd authz policies)
  • LLM prompt injection using legitimate tool calls maliciously (covered by RBAC on MCP, ADR 005)
  • Filesystem persistence attacks (covered by ephemeral PVCs in SandboxTemplate)
  • Resource exhaustion (covered by ResourceQuota on the namespace)

Risks

RiskLikelihoodImpactMitigation
Goose recipes fail under runsc (filesystem, /proc, network quirks)MediumMediumPatch the recipe; if genuinely incompatible, allowlist with explicit justification in docs/security.md
Network throughput drop hurts MCP latencyMediumLowgVisor netstack is actively improving; per-pod opt-out with --network=host available but must be justified
gVisor escape CVE (Google does patch these — see GHSA history)LowHighRenovate-tracked upgrades, gvisor-security mailing list subscription; combined with seccomp so an escape still hits RuntimeDefault
K3s containerd config drift across nodesMediumMediumBake the config template into K3s install scaffolding (precedent: nvidia-gpu-operator)
Operational complexity of two runtimesLowLowDefault stays runc; agent-worker taint + Kyverno admission make the boundary explicit
GPU-using agent workloads cannot use gvisorHighLowDocumented exclusion; inference + image gen stay on runc (they're trusted code paths with untrusted inputs — the model can't escape)

Open Questions

  1. Should agent-worker be a separate physical node or a labeled subset? Physical separation is the strongest isolation; labels save hardware but lean on scheduler discipline under pressure.
  2. --platform=systrap (default, most compatible) or --platform=kvm (faster, needs nested virt + CPU VMX)? Measure on the actual hardware.
  3. Where do goose_agent image builds run? apko build steps are syscall-heavy; build pods may stay on runc while execution of the built image runs on runsc.

References

ResourceRelevance
gVisor architectureHow runsc reimplements syscalls in userspace
gVisor + K3s walkthroughK3s containerd template extension pattern
Kubernetes RuntimeClassThe opt-in mechanism
gVisor compatibility docsKnown incompatibilities to check Goose recipes against
docs/security.mdDefense-in-depth model this ADR extends
agents/014: AX + SubstrateDepends on this ADR; Substrate's ateom-gvisor assumes runsc is live
projects/agent_platform/chart/agent-sandbox/crds/crds.yamlruntimeClassName field already present in CRD schema
projects/platform/nvidia-gpu-operator/values.yamlPrecedent for K3s containerd template customization