Agent execution security
This document explains the security model for KiCI agent execution isolation and provides configuration guidance for each backend.
Overview
Section titled “Overview”KiCI agents execute customer workflow code in isolated sandbox processes, never in the agent’s own V8 isolate. This means:
- Customer code cannot access agent-internal credentials (orchestrator URL, API keys, database connections)
- Customer code cannot interfere with the agent process itself
- The agent process only handles job orchestration, IPC, and log forwarding
The isolation boundary is enforced through the ExecutionSandbox interface, which all three backends implement: container, bare-metal, and Firecracker.
Note: This document covers agent execution security (sandbox isolation for customer code). For orchestrator-agent connection security (WS authentication, agent registration trust), the orchestrator requires agent token authentication by default (
KICI_AGENT_AUTH=token). Agents authenticate usingkat_*bearer tokens stored as SHA-256 hashes in the orchestrator database. See Orchestrator Configuration for setup details.
Process identity per backend
Section titled “Process identity per backend”The user identity that spawned processes run as depends on the scaler backend. This is critical for understanding the blast radius of a compromised workflow.
| Backend | Orchestrator runs as | Spawned agent/workflow runs as | Privilege drop? |
|---|---|---|---|
| Container | Any user with container socket access | Container image’s default user (typically root inside container, isolated by namespaces) | Container runtime handles isolation |
| Bare-metal | Any user | Same user as orchestrator (no privilege dropping) | No — no setuid, no su, no sudo |
| Bare-metal + bwrap | Any user | Same user as orchestrator, but namespace-isolated (PID, IPC, filesystem, network) | Partial — bwrap adds namespace isolation but does not change UID |
| Firecracker | Must be root (TAP device, bridge management) | Jailer drops to configured uid:gid before exec’ing Firecracker; inside VM, agent runs as the rootfs image’s default user | Yes — jailer enforces privilege drop |
Implications of running the orchestrator as root
Section titled “Implications of running the orchestrator as root”- Container backend: Low risk. The container runtime already provides process/filesystem/network isolation. The orchestrator user identity doesn’t propagate into containers.
- Bare-metal backend (no bwrap): High risk. Spawned agent processes inherit root privileges. Customer workflow code runs as root with full host filesystem and network access. Only acceptable for fully trusted, internal-only workflows.
- Bare-metal backend (with bwrap): Medium risk. bwrap provides namespace isolation (PID, IPC, filesystem read-only mounts, network loopback-only), but the process UID is still root inside the namespace. A bwrap escape would give root on the host.
- Firecracker backend: Expected. Root is required for TAP device and bridge management. The jailer drops privileges to the configured
uid:gidbefore running Firecracker, so the VM process itself does not run as root.
Recommendations
Section titled “Recommendations”- Never run bare-metal scaler as root for untrusted workloads. Use a dedicated service account with minimal privileges.
- Always enable bwrap for bare-metal if the orchestrator runs as any user with elevated privileges.
- Firecracker requires root — this is by design and safe due to jailer privilege dropping.
- Container backend is safe regardless of orchestrator user, since containers provide their own isolation boundary.
Isolation model per backend
Section titled “Isolation model per backend”Container backend (strongest for standard workloads)
Section titled “Container backend (strongest for standard workloads)”The container backend provides the strongest practical isolation for most deployments.
Architecture:
- Agent runs on the host (or in its own container)
- Each job gets a disposable Docker/Podman container
- The entire job lifecycle (git clone, dependency install, compile, step execution) runs inside the container
- Agent credentials never enter the container environment
- Container is torn down after each job
Security properties:
- Full filesystem isolation (container rootfs)
- Network isolation (container networking)
- Process isolation (container PID namespace)
- Environment isolation (sanitized env only, no KICI_* variables)
- Resource limits via container runtime (CPU, memory, disk)
When to use: Most deployments. Recommended for untrusted or semi-trusted workloads where you need strong isolation without the overhead of microVMs.
Bare-metal backend (trusted environments only)
Section titled “Bare-metal backend (trusted environments only)”The bare-metal backend provides process-level isolation with sanitized environment. It is suitable for trusted environments only where you control all workflow code.
Architecture:
- Agent runs on the host
- Workflow runner is forked as a child process using Node.js
child_process.fork() - The child process receives a sanitized environment (only allowlisted system variables + user-defined env + secrets)
- Optional bubblewrap (bwrap) adds PID/IPC/filesystem namespace isolation
Security properties (without bwrap):
- Environment isolation only (KICI_* and agent credentials excluded)
- No filesystem isolation (child process has full host access)
- No network isolation
- No resource limits (CPU/memory not enforced by KiCI — use OS-level cgroups or ulimit if needed)
- No PID/IPC namespace isolation
Security properties (with bwrap):
- Environment isolation (same as above)
- PID and IPC namespace isolation
- Network isolation via
--unshare-net(loopback only, no external connectivity) - Read-only system mounts (/usr, /lib, /bin, /etc/ssl)
- Writable workspace bind mount only
- Private /tmp, /dev, /proc
- Die-with-parent and new-session for process lifecycle safety
When to use: Development environments, internal CI where you trust all workflow authors, or when container overhead is unacceptable. Always enable bwrap for any environment with multiple users.
Firecracker backend (strongest for untrusted workloads)
Section titled “Firecracker backend (strongest for untrusted workloads)”The Firecracker backend provides VM-level isolation combined with defense-in-depth child process isolation.
Architecture:
- Each job runs inside a dedicated Firecracker microVM (separate kernel, rootfs, network)
- Inside the VM, the agent forks the workflow runner with sanitized environment
- The sandbox prevents customer code from accessing MMDS metadata (orchestrator URL, agent config)
- VM lifecycle is managed by the Firecracker scaler backend
Security properties:
- Full VM isolation (separate kernel, memory, disk)
- Network isolation (VM-level networking with NAT)
- Environment isolation inside the VM (defense-in-depth)
- MMDS metadata not accessible to customer code
- Complete teardown after each job (fresh rootfs per VM)
When to use: Public CI services, running untrusted code from external contributors, maximum security requirements.
Safety mechanisms comparison
Section titled “Safety mechanisms comparison”| Mechanism | Container | Bare-metal | Bare-metal + bwrap | Firecracker |
|---|---|---|---|---|
| Process isolation | PID namespace (container) | None | PID namespace (bwrap) | VM PID namespace |
| Filesystem isolation | Container rootfs | None | Read-only system mounts, writable workspace only | VM ext4 rootfs (full copy per job) |
| Network isolation | Bridge network + nftables RFC1918 blocking (default: on) | None | Loopback only (--unshare-net) | TAP device + nftables RFC1918 blocking |
| Resource limits | CPU (NanoCpus) + memory (cgroups) via resources: config | None (use OS-level cgroups/ulimit) | None (use OS-level cgroups/ulimit) | vCPU count + memory (MiB) per VM |
| Credential isolation | Environment allowlist; KICI_* excluded | Environment allowlist; KICI_* excluded | Environment allowlist; KICI_* excluded | Environment allowlist + MMDS cleared after boot |
| Secret delivery | IPC (never in env) | IPC (never in env) | IPC (never in env) | IPC (never in env) |
| Process lifecycle | Container auto-remove | Detached process group; SIGTERM→SIGKILL | --die-with-parent, --new-session | Jailer + VM teardown |
| Privilege dropping | Container runtime handles user context | None | None (UID unchanged, but namespace-isolated) | Jailer drops to configured uid:gid |
Environment variables
Section titled “Environment variables”What enters the sandbox
Section titled “What enters the sandbox”The sandbox environment is constructed from a 7-layer merge (later overrides earlier):
-
System allowlist — Only these host variables are copied:
PATH— Required for command executionHOME— User home directoryUSER— Current user nameSHELL— User’s shellLANG— Locale settingLC_ALL— Locale overrideTERM— Terminal typeTMPDIR— Temporary directory pathNODE_PATH— Node.js module resolutionTZ— Timezone
-
Sandbox defaults —
FORCE_COLOR=1and similar defaults to ensure correct tool behavior in non-TTY environments -
KICI_* system vars — Orchestrator-generated variables passed via
userEnv -
Org-level environment vars — Variables from the environment configuration (pre-merged by the orchestrator)
-
Source-level environment overrides — Per-source overrides merged into the environment vars by the orchestrator
-
Job env — SDK-defined
envfield from the lock file, evaluated by the orchestrator -
setEnv() calls — Runtime calls from step code (applied at step execution time, not during env construction)
Note: Secrets are NOT injected into environment variables. They flow through IPC to ctx.secrets and are only exposed to the process environment when the workflow author explicitly calls ctx.secrets.expose().
What is excluded
Section titled “What is excluded”The following categories are never passed to the sandbox:
KICI_*— All agent-internal variables (KICI_ORCHESTRATOR_URL, KICI_AGENT_ID, KICI_LABELS, etc.)KICI_DATABASE_URL— Agent/orchestrator database connection stringsKICI_PLATFORM_TOKEN— Platform relay authentication tokens- Any variable not in the system allowlist above
This is an explicit allowlist approach: adding new environment variables to the host agent will not leak them to customer code.
KICI_AGENT_ENV_ prefix forwarding
Section titled “KICI_AGENT_ENV_ prefix forwarding”Operators can forward custom environment variables from the orchestrator to spawned agents using the KICI_AGENT_ENV_ prefix. The orchestrator strips the prefix before passing the variable to the agent:
# On the orchestrator hostexport KICI_AGENT_ENV_HTTP_PROXY=http://proxy:3128export KICI_AGENT_ENV_NO_PROXY=localhost,.internalexport KICI_AGENT_ENV_CUSTOM_FLAG=enabledThe agent receives:
HTTP_PROXY=http://proxy:3128NO_PROXY=localhost,.internalCUSTOM_FLAG=enabled
This mechanism is useful for passing proxy settings, custom flags, or other operator-controlled values to agents without modifying the scaler config file. Variables forwarded via KICI_AGENT_ENV_ have lower precedence than env: entries in scalers.yaml — if both define the same variable, the scalers.yaml value wins.
Backend support:
| Backend | KICI_AGENT_ENV_ support | Notes |
|---|---|---|
| Bare-metal | Yes | Prefix stripped, passed to process |
| Container | Yes | Prefix stripped, passed to container env |
| Firecracker | Yes | Prefix stripped, passed via MMDS (per-key under meta-data/kici-env/); per-VM ≤32 KiB total budget enforced on the orchestrator side |
Per-backend environment variable sources
Section titled “Per-backend environment variable sources”The following table shows which environment variable sources are passed to agents on each backend:
| Source | Bare-metal | Container | Firecracker |
|---|---|---|---|
| System vars (PATH, HOME, …) | Allowlist from orchestrator process.env | Not inherited (container has own) | Not inherited (VM has own) |
| KICI_* agent vars | Explicit values | Explicit values | Via MMDS + register.ack |
| KICI_AGENT_ENV_ forwarded | Yes (prefix stripped) | Yes (prefix stripped) | Yes (prefix stripped, via MMDS, ≤32 KiB) |
scalers.yaml env: | Yes (highest priority) | Yes (highest priority) | Yes (highest priority, via MMDS, ≤32 KiB) |
| Orchestrator secrets | Never passed | Never passed | Never passed |
Bare-metal trust model
Section titled “Bare-metal trust model”The bare-metal backend runs agent processes directly on the host. At startup, the orchestrator logs a warning when a bare-metal scaler is configured:
WARN: Bare-metal scaler "gpu-machines" configured. Bare-metal agents run as child processeswith full host filesystem and network access. This mode is intended for trusted environments only.WARN: Consider enabling bubblewrap (bwrap) for process isolation. See docs/operator/agent-security.mdThe bare-metal backend provides environment isolation (credentials are not leaked to agents) but does not provide filesystem or network isolation without bubblewrap. Only use bare-metal for environments where you trust all workflow code.
Passing custom variables to workflows
Section titled “Passing custom variables to workflows”To make custom environment variables available to workflow steps:
- Workflow-level env — Define in the workflow file (
.kici/workflows/*.ts) - Orchestrator-provided env — Set via job dispatch configuration
- KICI_AGENT_ENV_ prefix — Set on the orchestrator host for operator-controlled variables
- Secrets — Pass via the secrets mechanism for sensitive values
Container image requirements
Section titled “Container image requirements”When using the container backend, the container image must have:
- Node.js installed (v24 or later recommended)
- git installed (for repository cloning)
- Standard POSIX utilities (sh, mkdir, rm, etc.)
The workflow runner script is bind-mounted read-only into the container at /opt/kici/workflow-runner.js — it does not need to be baked into the image.
Recommended base images:
node:24-alpine— Lightweight, includes Node.js and gitnode:24-slim— Debian-based, smaller than full image
Bubblewrap (Bare-Metal)
Section titled “Bubblewrap (Bare-Metal)”Enabling bubblewrap
Section titled “Enabling bubblewrap”Bubblewrap isolation for bare-metal execution is opt-in via the KICI_SANDBOX environment variable:
KICI_SANDBOX=trueWhen set, the agent wraps every workflow runner fork in bwrap with the namespaces and mounts described below. Ensure bwrap is installed on the host (see system requirements). The default is false — the bare-metal sandbox runs workflow code as a plain forked Node.js process with only environment sanitization.
The orchestrator validates bwrap availability at startup when KICI_AGENT_ENV_KICI_SANDBOX=true is set: if the binary is missing the orchestrator exits with a clear error rather than failing every job at dispatch time. bwrap is Linux only — there is no equivalent on macOS or Windows, so the option is rejected on those platforms.
Network mode
Section titled “Network mode”KICI_SANDBOX_NETWORK controls the network namespace when bwrap is enabled:
| Value | Behavior |
|---|---|
isolated | Default. bwrap --unshare-net — loopback only, no external connectivity. Strongest isolation; breaks workflows that need to reach package registries (npm, pip, cargo, etc.). |
host | Keep the host network namespace. Workflows can talk to the network. Use this when workflows need npm install, git clone https://, or other outbound traffic. |
# Strongest: PID/IPC/filesystem isolation AND no networkKICI_SANDBOX=trueKICI_SANDBOX_NETWORK=isolated # default
# Host network: PID/IPC/filesystem isolation, network unrestrictedKICI_SANDBOX=trueKICI_SANDBOX_NETWORK=hostSystem requirements
Section titled “System requirements”Install bubblewrap and (optionally) slirp4netns:
# Debian/Ubuntuapt install bubblewrap
# Fedora/RHELdnf install bubblewrap
# Optional: for network namespace isolation (not currently enabled)apt install slirp4netnsWhat bubblewrap provides
Section titled “What bubblewrap provides”- PID namespace (
--unshare-pid) — Workflow runner cannot see or signal other host processes - IPC namespace (
--unshare-ipc) — Shared memory isolation between the runner and host - Filesystem isolation — System directories mounted read-only, only workspace is writable
- Process lifecycle safety —
--die-with-parentensures child dies if agent crashes,--new-sessionprevents terminal signal propagation
What bubblewrap does NOT provide
Section titled “What bubblewrap does NOT provide”- Resource limits — CPU/memory limits are not enforced by bwrap. Use cgroups or container runtime for resource control.
Note: Network isolation via
--unshare-netis now enabled by default when bwrap is active. Customer workflow code has no external network access (loopback only). This is intentionally strict — bare-metal is for trusted environments.
Filesystem mount details
Section titled “Filesystem mount details”| Host Path | Container Path | Mode |
|---|---|---|
| /usr | /usr | read-only |
| /lib | /lib | read-only |
| /lib64 (if exists) | /lib64 | read-only |
| /bin | /bin | read-only |
| /sbin | /sbin | read-only |
| /etc/resolv.conf | /etc/resolv.conf | read-only |
| /etc/ssl | /etc/ssl | read-only |
| Node.js binary dir | (same path) | read-only |
| Workspace | /workspace | read-write |
| (new) | /dev | private |
| (new) | /proc | private |
| (new) | /tmp | private (tmpfs) |
Execution mode selection
Section titled “Execution mode selection”The agent selects the sandbox backend using this priority:
- Container config in job dispatch — If the job includes container configuration, uses ContainerSandbox
- KICI_EXECUTION_MODE env var — Explicit backend selection (
container,bare-metal,firecracker) - KICI_SCALER_MANAGED=1 detection — Agents managed by the Firecracker scaler use FirecrackerSandbox
- Default — Falls back to BareMetalSandbox (sandbox=false)
Set KICI_EXECUTION_MODE in the agent’s environment to override automatic detection:
# Force container modeexport KICI_EXECUTION_MODE=container
# Force bare-metal with bwrap (requires bubblewrap installed)export KICI_EXECUTION_MODE=bare-metal
# Force Firecracker mode (only inside Firecracker VMs)export KICI_EXECUTION_MODE=firecracker