Skip to content

Storage layout

The orchestrator writes to three independent object-storage subsystems: cache, logs, and cold-store. Each is configured with its own env vars and key prefix; they can share a bucket or use separate ones depending on retention and access patterns. This doc is the canonical map of which prefix holds what data, which env var names the bucket, and where to look in code if the doc and reality drift.

Doc invariants: any change to a storage prefix, env var, or cold-store table requires updating this doc in the same commit. See .claude/rules/storage.md for the enforced 1:1 rules.

SubsystemBucket env varDefault prefixRetentionWhat lives there
CacheKICI_STORAGE_BUCKETkici-cache/TTL via KICI_CACHE_TTL_DAYS (default 30 days)Compiled source tarballs, dependency tarballs, dep-tarball integrity-hash companions
LogsKICI_STORAGE_LOG_BUCKETkici-logs/None on step logs; webhook payloads → cold-store after 30 daysNDJSON step logs, gzipped webhook delivery payloads
Cold-storeKICI_COLD_STORE_BUCKETcold-store/Per-table tier (30d / 180d / 1y / 2y / forever)Append-only archive of execution rows, secret-audit-log rows, access-log rows, event-log rows

The log bucket falls back to the cache bucket if KICI_STORAGE_LOG_BUCKET is unset; the cold-store bucket is independent and may live in a different account/region.

Compiled source bundles and dependency tarballs the orchestrator hands to execution agents. Two backends ship:

  • s3 — pre-signed URLs against an S3-compatible bucket. Recommended for multi-host / production deployments.
  • filesystem — local files served through the orchestrator’s HMAC-signed /api/v1/cache/blob/<key> HTTP route. Intended for single-host deployments and E2E sandboxes where standing up an S3-compatible service is overkill.
KeyDescriptionSource
source/{contentHash}.tar.gzPlatform-agnostic source tarball (.kici/ minus node_modules/); one entry shared across linux-x64 / linux-arm64 / etc.packages/orchestrator/src/storage/s3.ts
deps/{platform}-{arch}/{lockfileHash}.tar.gzPlatform-specific dependency tarball; one entry per (platform, arch, lockfile-hash) tuple.packages/orchestrator/src/storage/s3.ts
deps/{platform}-{arch}/{lockfileHash}.tar.gz.hashSHA-256 companion file for dep-tarball integrity verification.packages/orchestrator/src/storage/s3.ts
provenance/{runId}/{jobId}/{subjectDigest}.kici.jsonSigned build-provenance bundle produced by ctx.attestProvenance (DSSE envelope + ephemeral public key + identity token). One entry per attested artifact; the attestations DB row points at this key.packages/engine/src/provenance/bundle.ts (provenanceStorageKey)
.kici-cluster-idCluster identity sentinel (UUID). Written once on first orch boot, validated on every subsequent boot. Mismatch with cluster_meta.cluster_id blocks startup — see cluster identity.packages/orchestrator/src/cluster/cluster-identity.ts

The source / deps keys are namespaced under the configured prefix (default kici-cache/). The .kici-cluster-id sentinel sits at <KICI_STORAGE_PREFIX>/.kici-cluster-id (or at the bucket root when no prefix is set), so two clusters can safely share a physical bucket as long as they use distinct prefixes.

Env varRequired?Description
KICI_STORAGE_TYPEyes (when caching)s3 or filesystem
KICI_STORAGE_BUCKETyes (when s3)Bucket name
KICI_STORAGE_PREFIXno (kici-cache/)Object-key prefix
KICI_STORAGE_REGIONnoAWS region
KICI_STORAGE_ENDPOINTnoCustom S3 endpoint the orchestrator uses for its own object operations
KICI_STORAGE_EXTERNAL_ENDPOINTnoEndpoint baked into pre-signed URLs handed to agents (container-routable). Falls back to KICI_STORAGE_ENDPOINT when unset
KICI_STORAGE_UPLOAD_ENDPOINTnoEndpoint baked into the pre-signed upload URL handed to the host CLI running kici run remote. Set when the developer machine reaches the bucket at a different address than the orchestrator. Falls back to KICI_STORAGE_ENDPOINT when unset
KICI_STORAGE_FORCE_PATH_STYLEnotrue for S3-compatible services that need path-style addressing
KICI_STORAGE_FS_PATHyes (when filesystem)Absolute directory where blobs are stored
KICI_STORAGE_FS_BASE_URLnoBase URL the agent uses to reach the orchestrator (e.g., http://orch.local:10143). Defaults to http://127.0.0.1:<KICI_PORT>
KICI_STORAGE_LOG_BUCKETnoOptional separate bucket for log storage (see “Log storage” below)
KICI_CACHE_TTL_DAYSno (30)Days of inactivity before an entry is evicted (touch-on-read)
KICI_CACHE_MAX_TARBALL_BYTESno (524288000)Max dep-tarball size; build fails if exceeded
KICI_CACHE_BUILD_TIMEOUT_MSno (600000)Build job timeout
KICI_USER_CACHE_QUOTA_BYTESno (5368709120)Cluster-wide default byte quota for the user-facing cache (ctx.cache / declarative job-step cache); least-recently-used entries evicted past quota. Overridable per org via org_settings.user_cache_quota_bytes (kici-admin org-settings user-cache set-quota)
KICI_USER_CACHE_TTL_MSno (604800000)Cluster-wide default per-entry TTL for the user-facing cache (touch-on-read). Overridable per org via org_settings.user_cache_ttl_ms (kici-admin org-settings user-cache set-ttl)

The three endpoints (one per vantage point)

Section titled “The three endpoints (one per vantage point)”

kici run remote involves three parties that each reach the bucket from a different place on the network, so the S3 backend exposes three endpoint knobs:

EndpointUsed bySet it to an address reachable from
KICI_STORAGE_ENDPOINTthe orchestrator’s own object operations (head / copy / metadata)the orchestrator process
KICI_STORAGE_UPLOAD_ENDPOINTthe host CLI’s pre-signed upload URLthe developer machine running kici run remote
KICI_STORAGE_EXTERNAL_ENDPOINTthe agent’s pre-signed up/download URLsthe execution agent (often a container)

When the orchestrator, the developer machine, and the agents all reach the bucket at the same address (e.g. a public AWS S3 endpoint), only KICI_STORAGE_ENDPOINT is needed — the other two fall back to it. They matter when the addresses diverge: the Docker quickstart, for example, runs the orchestrator in a container (so its endpoint is the compose DNS name seaweedfs:8333), the host CLI uses localhost:8333, and spawned agent containers use host.docker.internal:8333.

These three endpoints are static connection / topology configuration, set once per deployment based on the network layout. They are not per-tenant org_settings tunables — they describe where each party reaches the bucket, which is a property of the deployment’s network, not of any one organization.

kici run remote against a hidden orchestrator

Section titled “kici run remote against a hidden orchestrator”

kici run remote uploads the working-tree overlay directly from the developer machine to the object store via a pre-signed PUT URL minted with KICI_STORAGE_UPLOAD_ENDPOINT (falling back to KICI_STORAGE_ENDPOINT). The run is initiated and its logs are retrieved through the Platform relay over a WebSocket connection — the developer machine never talks to the orchestrator’s HTTP API directly. This means an orchestrator can sit entirely behind a private network: only the object store needs to be reachable from the developer machine for remote runs to work; the orchestrator’s HTTP API does not. Point KICI_STORAGE_UPLOAD_ENDPOINT at a dev-reachable bucket address whenever the developer machine reaches the object store at a different address than the orchestrator does.

kici run remote is offered by the Platform — an orchestrator with no Platform connection cannot serve remote runs (there is no air-gapped, orchestrator-direct remote-run path). Executing workflow steps on the developer machine with no orchestrator at all is kici run local.

When KICI_STORAGE_TYPE=filesystem, the orchestrator stores each blob as a file under KICI_STORAGE_FS_PATH/<key> with a sibling <key>.meta.json carrying the same created-at / last-accessed-at timestamps the S3 backend uses for TTL. The cache infrastructure (source cache, dep cache, build coordinator) treats both backends uniformly — the only difference is how URLs are minted.

Agents fetch blobs over HTTP at <KICI_STORAGE_FS_BASE_URL>/api/v1/cache/blob/<key>?sig=<token>. The sig token is an HMAC-SHA256 over (method, key, expiry) keyed by a process-local secret generated at boot, with a one-hour lifetime. Tokens become invalid on orchestrator restart — fine for the single-host deployment shape the backend targets. Upload completion still flows through the existing cache.upload.complete WebSocket message so the orchestrator can stamp metadata atomically.

The filesystem backend is not appropriate for production: the cache directory is local to one orchestrator host, the URLs are not portable across orchestrator restarts, and there is no shared-bucket lifecycle policy to fall back on for TTL enforcement. Lazy expiry via last-accessed-at is the only eviction path — long-running deployments should still pick s3.

These names are read directly by loadConfig() in packages/orchestrator/src/config.ts and bridged into the storage.* config field. They follow the project-wide KICI_-prefix convention and benefit from the unknown-env-var typo catcher at boot.

Estimates derived from configurable caps and per-event derivation. Empirical numbers depend on workload — verify against the linked Prometheus metrics on a running deployment.

ObjectTypical sizeHard capWrite opRead op
source/{contentHash}.tar.gztens of KB to a few MBnone (raw .kici/ minus node_modules)PutObject per cache MISS (per unique workflow content)GetObject per execution job dispatch (cache HIT path)
deps/{platform}-{arch}/{...}.tar.gz50–200 MB (npm deps)KICI_CACHE_MAX_TARBALL_BYTES (500 MB)PutObject per cache MISS (per unique lockfile + platform/arch)GetObject per execution job dispatch
deps/{...}.tar.gz.hash64 bytes (SHA-256 hex)PutObject per dep MISS (companion to dep tarball)GetObject per dep download (integrity check)

Per-job cost (cache HIT path): 2× GetObject (1 source + 1 dep) on the agent’s pre-signed URL — the orchestrator itself only signs the URLs.

Per-build cost (cache MISS): 2× PutObject (source + dep) + 1× PutObject for the .hash companion + 2× metadata CopyObject (initMeta TTL bookkeeping). Build duration tracked by kici_orch_build_duration_seconds.

Touch-on-read (every cache HIT): 1× CopyObject to refresh the last-accessed-at metadata. This is fire-and-forget; failure to touch does not fail the dispatch.

Hit/miss ratio (verifiable): kici_orch_source_cache_hits_total / kici_orch_source_cache_misses_total and kici_orch_dep_cache_hits_total / kici_orch_dep_cache_misses_total.

PhaseTriggerWhat happens
CreatedBuild job completes (cache MISS path)Build agent uploads via pre-signed PUT URL; orchestrator runs initMeta (CopyObject to set last-accessed-at); writes .hash companion for deps
Updated(immutable content; only metadata refreshed)S3CacheStorage.get() issues a CopyObject to bump last-accessed-at on every cache HIT (touch-on-read)
DeletedTTL expiry (KICI_CACHE_TTL_DAYS default 30)S3 bucket lifecycle rule deletes objects whose last-accessed-at is older than the TTL; lazy delete also runs in get() if the object is found stale

Source: packages/orchestrator/src/storage/s3.ts. See dependency caching for the full cache flow including pre-signed URL exchanges and integrity verification.

The user-facing cache backs the SDK’s declarative cache: { key, paths, restoreKeys? } on jobs/steps and the imperative ctx.cache.restore() / ctx.cache.save() API. It reuses the same CacheStorage backend (and the same KICI_STORAGE_* configuration) as the source/dep cache above, but lives under its own cache/ prefix and has its own per-org quota and TTL. See the SDK caching reference for the author-facing surface.

The byte quota and per-entry TTL are cluster-configurable per org. The KICI_USER_CACHE_QUOTA_BYTES / KICI_USER_CACHE_TTL_MS env vars set the cluster-wide default; an operator overrides them for a single tenant at runtime by writing org_settings.user_cache_quota_bytes / org_settings.user_cache_ttl_ms (both NULLABLE — NULL means “use the cluster default”). The override is reachable through both the orchestrator admin HTTP route and the CLI:

Terminal window
# Read the current per-org quota + TTL (cluster default shown when unset):
kici-admin org-settings user-cache show --org <customerId>
# Set a per-org override (positive integer; bytes / milliseconds):
kici-admin org-settings user-cache set-quota 10737418240 --org <customerId> # 10 GiB
kici-admin org-settings user-cache set-ttl 1209600000 --org <customerId> # 14 days
# Clear the override and fall back to the cluster default:
kici-admin org-settings user-cache reset-quota --org <customerId>
kici-admin org-settings user-cache reset-ttl --org <customerId>

The orchestrator resolves the effective quota + TTL from org_settings at cache operation time, falling back to the env-var default when the org column is NULL.

KeyDescriptionSource
cache/<orgId>/<repoId>/shared/<key>.tar.gzCommitted cache entry for a trusted ref (org-shared scope, visible to every run of the repo). Immutable: first save under an exact key wins.packages/orchestrator/src/cache/user-cache.ts
cache/<orgId>/<repoId>/iso/<runId>/<key>.tar.gzCommitted cache entry for an untrusted / fork ref (per-run isolated scope). Reads fall back to the shared scope, but writes can never land in shared/.packages/orchestrator/src/cache/user-cache.ts
cache/<orgId>/<repoId>/<scope>/<key>.tar.gz.hashSHA-256 companion of the tarball bytes, for integrity verification on download (the presigned upload carries no custom metadata).packages/orchestrator/src/cache/user-cache.ts
cache/<orgId>/<repoId>/<scope>/<key>.tar.gz.sizeTarball byte-size companion, used for per-org quota accounting.packages/orchestrator/src/cache/user-cache.ts
cache/<orgId>/<repoId>/<scope>/.tmp-<uuid>.tar.gzTransient upload target for an in-flight save. Copied to the final <key>.tar.gz and deleted on commit, so a crashed save never leaves a corrupt final entry.packages/orchestrator/src/cache/user-cache.ts

Isolation invariant. Every key is namespaced under cache/<orgId>/ first, so no tenant can read another tenant’s cache — the org segment is the per-tenant boundary and the per-org quota scope. Within an org, <repoId> separates repositories, and the trailing scope segment separates the shared (trusted-ref) scope from per-run isolated (untrusted/fork-ref) scopes. A trusted ref reads and writes shared/; an untrusted ref reads its own iso/<runId>/ then falls back to shared/ on restore, but writes only to iso/<runId>/. This is the cache-isolation model that stops a fork build from poisoning the cache a trusted branch later restores. The orchestrator maps a ref’s trust level to the write scope via the cacheRefScope field on the job dispatch (trusted → shared, otherwise isolated).

ObjectTypical sizeHard capWrite opRead op
cache/.../<key>.tar.gzKB to hundreds of MB (workload)per-org quota: org_settings.user_cache_quota_bytes if set, else KICI_USER_CACHE_QUOTA_BYTES (5 GiB)1× presigned PutObject (to temp) + 1× CopyObject (temp→final) + 1× initMeta CopyObject per save MISS1× presigned GetObject per cache HIT (agent downloads directly)
cache/.../<key>.tar.gz.hash64 bytes (SHA-256 hex)PutObject on commitGetObject per restore (integrity check on download)
cache/.../<key>.tar.gz.size<16 bytes (decimal byte count)PutObject on commitGetObject per quota sweep entry

Per-restore cost (cache HIT): 1× presigned GetObject (the agent fetches the tarball directly) + 1× GetObject for the .hash companion + 1× CopyObject touch-on-read to refresh the entry’s TTL.

Per-save cost (cache MISS): 1× presigned PutObject to a .tmp-<uuid> object, then on commit 1× CopyObject (temp→final) + 1× DeleteObject (temp) + 1× initMeta CopyObject + 2× PutObject (.hash + .size). A save under an already-existing exact key is skipped entirely (immutable no-op).

Quota sweep (on every commit): 1× ListObjectsV2 over cache/<orgId>/ + 1× GetObject per entry’s .size companion to total the org’s bytes; if over quota, the orchestrator reads each entry’s last-accessed-at metadata and evicts least-recently-used-first (HeadObject / .meta.json read per candidate), DeleteObject-ing the tarball + its .hash + .size companions until back under quota. Each eviction is logged.

PhaseTriggerWhat happens
CreatedA save under a new exact key commitsAgent uploads via presigned PUT to a .tmp-<uuid> object; orchestrator copies temp→final, runs initMeta, writes .hash + .size companions, deletes the temp
Updated(immutable content; only metadata refreshed)A restore issues a CopyObject touch-on-read to bump last-accessed-at (TTL refresh). A re-save under an existing exact key is a no-op.
DeletedPer-org quota exceeded on save, or TTL expiryQuota: least-recently-used entries (oldest lastAccessedAt, refreshed on every restore) evicted on the committing save until the org is under its effective quota (org_settings.user_cache_quota_bytes, else the KICI_USER_CACHE_QUOTA_BYTES default). TTL: entries unused for the org’s effective TTL (org_settings.user_cache_ttl_ms, else the KICI_USER_CACHE_TTL_MS 7-day default) expire lazily on access — the per-org TTL is threaded into the backing CacheStorage access as a per-operation override.

Source: packages/orchestrator/src/cache/user-cache.ts. The quota and TTL default cluster-wide via KICI_USER_CACHE_QUOTA_BYTES / KICI_USER_CACHE_TTL_MS (see the env vars table above) and are overridable per org via org_settings.user_cache_quota_bytes / org_settings.user_cache_ttl_ms (kici-admin org-settings user-cache). See architecture: data flows for the restore/save protocol and the trust→scope mapping.

Step execution logs (NDJSON) and webhook delivery payloads (gzipped). Both subsystems share one LogStorage instance configured via KICI_STORAGE_LOG_BUCKET (or a fallback to the cache bucket if unset).

KeyDescriptionSource
kici-logs/...NDJSON step logs (one object per step, range-paginated reads).packages/orchestrator/src/orchestrator-core.ts
event-log/{orgId}/{deliveryId}.json.gzGzipped webhook delivery payload, hashed and metadata-attached. Capped at KICI_EVENT_LOG_MAX_PAYLOAD_BYTES (default 5 MB).packages/orchestrator/src/webhook/event-log.ts

Both prefixes are hardcoded in source — there is no env var that overrides them. If you need a different layout (e.g., to share the log bucket with another service that already owns one of these prefixes), use a dedicated KICI_STORAGE_LOG_BUCKET rather than trying to relocate the prefix.

Env varRequired?DefaultDescription
KICI_STORAGE_LOG_BUCKETno(cache bucket)Bucket for step logs + webhook payloads. When unset, log objects co-locate with cache objects.
KICI_EVENT_LOG_MAX_PAYLOAD_BYTESno5242880Soft cap for stored webhook payload size. Larger payloads are recorded with payload_omitted=true.
KICI_WEBHOOK_PAYLOAD_DIRnoLocal-filesystem fallback when KICI_STORAGE_TYPE is unset (no S3). When set, also the base of the on-disk step-log store (<dir>/logs).
KICI_DATA_DIRno(auto)Data root for the filesystem step-log store when KICI_WEBHOOK_PAYLOAD_DIR is unset. Logs land under <KICI_DATA_DIR>/cache/logs.

Region, endpoint, force-path-style, and credentials for the log bucket are inherited from the cache configuration — there are no separate KICI_STORAGE_LOG_REGION / KICI_STORAGE_LOG_ENDPOINT vars.

When KICI_STORAGE_TYPE is unset (no S3), step logs are written to a local directory resolved in this order: KICI_WEBHOOK_PAYLOAD_DIR/logs if set, otherwise <data-root>/cache/logs where <data-root> is KICI_DATA_DIR if set, else /var/lib/kici if writable, else ${XDG_STATE_HOME:-$HOME/.local/state}/kici. The final fallback means a user-level orchestrator (one that cannot write the root-owned /var/lib/kici) stores logs under its own home directory instead of failing the first job with EACCES.

ObjectTypical sizeHard capWrite opRead op
kici-logs/... (step logs)<1 MB typical10 MB per step (KICI_MAX_LOG_SIZE_BYTES on agent)Append (read-modify-write) per chunk batch from agent: 1× GetObject + 1× PutObject per LogWriter flush per stepRange-paginated GetObject on dashboard log-viewer page navigation
event-log/{orgId}/{deliveryId}.json.gz<100 KB typical (gzipped)5 MB (KICI_EVENT_LOG_MAX_PAYLOAD_BYTES)PutObject per accepted webhook deliveryGetObject per payload-viewer click (chunked-WS transport — body is decompressed once on the orchestrator, then sliced into 64 KiB chunks streamed through Platform; Platform never buffers the full body) + 1× per re-run that needs payload replay

Per-step cost: an S3LogStorage.append() is a read-modify-write — the implementation is acceptable for ≤10 MB step logs but is NOT suitable for high-frequency log lines. Agent-side batching (LogWriter chunk size) controls flush frequency.

Oversize webhook payloads: deliveries above KICI_EVENT_LOG_MAX_PAYLOAD_BYTES are NOT stored — the row is recorded with payload_omitted=true and the metadata + hash + size remain durable. The 5 MB cap is sized for GitHub’s max webhook body; raise only if the provider regularly exceeds it.

Verifiable counters: kici_orch_log_chunks_received_total, kici_orch_log_bytes_stored_total. Per-org bytes gauge: orgLogBytes (Platform-side aggregate).

ObjectPhaseTriggerWhat happens
Step logsCreatedFirst chunk arrives from agent for a given stepPutObject with empty NDJSON body; subsequent chunks append via read-modify-write
Step logsUpdatedEach agent log-chunk batchGetObject (existing body) → concatenate → PutObject (new body)
Step logsDeletedNever (no TTL)Objects persist indefinitely; deletion only happens via manual kici-admin ops
Webhook payloadsCreatedprocessWebhook() accepts a deliveryGzip + PutObject to event-log/{orgId}/{deliveryId}.json.gz
Webhook payloadsUpdated(immutable; never updated)
Webhook payloadsDeletedCold-store sweeper after KICI_COLD_STORE_EVENT_LOG_WARM_TTL_DAYS (30d)Row archived into cold-store/orchestrator/event_log/... chunk; original event-log/... deleted

Source: packages/orchestrator/src/reporting/s3-log-storage.ts, packages/orchestrator/src/webhook/event-log.ts, packages/orchestrator/src/cold-store/orchestrator-cold-store.ts.

When a workflow step calls ctx.attestProvenance, the agent uploads the signed provenance bundle to the cache storage backend under provenance/{runId}/{jobId}/{subjectDigest}.kici.json (see the cache Prefixes table), and the orchestrator records one row in the attestations table so the dashboard can list and fetch attestations per run/job.

ColumnDescription
idRandom primary key.
run_idKiCI run the attestation belongs to (indexed with job_id).
job_idKiCI job that produced the attestation.
subject_nameCaller-supplied artifact name.
subject_digestPrimary subject digest (lowercase hex); the storage-key discriminator.
storage_keyObject-storage key of the bundle.
modeSigning mode (kici for the KiCI-signed bundle).
media_typeBundle media type.
created_atInsert timestamp.

Migration: packages/orchestrator/src/db/migrations/036_attestations.ts. The row is written on a provenance.upload.complete from the agent, with run_id resolved server-side from the job’s dispatch state (never the wire). The bundle objects are immutable and follow the same cache-bucket lifecycle as source/dep tarballs.

Append-only archive of warm-table rows that have aged out of the operational tables. Backed by a versioned S3 bucket so accidental operator deletes are recoverable.

{prefix}orchestrator/{table}/{tenantId}/{YYYY}/{MM}/{DD}/{retention-bucket}/{chunkId}.{jsonl.gz|manifest.json}
  • {prefix}KICI_COLD_STORE_PREFIX, default cold-store/
  • orchestrator — fixed db segment that distinguishes orchestrator chunks from platform chunks (the two services share one bucket)
  • {table} — the source table name (see “Tables” below)
  • {tenantId} — the customer org id
  • {YYYY}/{MM}/{DD} — chunk write date in UTC
  • {retention-bucket} — one of 30d, 180d, 1y, 2y, forever (defined in packages/shared/src/cold-store/bucket.ts as COLD_BUCKET_NAMES)
  • {chunkId} — random unique chunk identifier
  • Two objects per chunk: the data (*.jsonl.gz) and a manifest (*.manifest.json) that records the row range and integrity hash.

Source: packages/shared/src/cold-store/key.ts.

TableDefault warm TTLWhat lives there
execution_runs30 daysCompleted run metadata (status, timings, parent_run_id) for terminal runs
execution_jobs30 daysPer-job metadata (status, agent, queue/run timings)
execution_steps30 daysPer-step execution details (status, exit code, duration, log pointer)
secret_audit_log30 daysSecret access / rotation / write events
access_log30 daysAudit trail of API and CLI actions on the orchestrator
event_log30 daysWebhook delivery metadata (paired with the gzipped payload object)

Tables registered in packages/orchestrator/src/cold-store/orchestrator-cold-store.ts. Adding a new table to that file is a doc trigger — see .claude/rules/storage.md.

Env varRequired?DefaultDescription
KICI_COLD_STORE_ENABLEDnofalseMaster toggle. When unset, archival never runs.
KICI_COLD_STORE_BUCKETyes (when enabled)Bucket for archived chunks
KICI_COLD_STORE_PREFIXnocold-store/Object-key prefix
KICI_COLD_STORE_REGIONnoAWS region
KICI_COLD_STORE_ENDPOINTnoCustom S3 endpoint
KICI_COLD_STORE_EXTERNAL_ENDPOINTnoEndpoint used when generating pre-signed URLs
KICI_COLD_STORE_FORCE_PATH_STYLEnoPath-style addressing (true / false)
KICI_COLD_STORE_S3_CONCURRENCYno4Concurrent S3 PUT/GET cap

These vars are read directly via process.env in cold-store/orchestrator-cold-store.ts — they are not part of the main defineEnv() Zod schema, so they don’t appear in the auto-generated env reference. They’re listed in COLD_STORE_ENV_VARS (packages/orchestrator/src/config.ts) so the unknown-KICI_* typo catcher allows them.

Six knobs per table, with the table name segment uppercased:

KICI_COLD_STORE_<TABLE>_WARM_TTL_DAYS
KICI_COLD_STORE_<TABLE>_MIN_WARM_TENANT_BYTES
KICI_COLD_STORE_<TABLE>_MIN_CHUNK_BYTES
KICI_COLD_STORE_<TABLE>_MAX_CHUNK_BYTES
KICI_COLD_STORE_<TABLE>_MAX_ROWS_PER_CYCLE
KICI_COLD_STORE_<TABLE>_ENABLED

Where <TABLE> is one of EXECUTION_RUNS, EXECUTION_JOBS, EXECUTION_STEPS, SECRET_AUDIT_LOG, ACCESS_LOG, EVENT_LOG. Defaults: 30 days warm, 5 MiB minimum tenant bytes before archival kicks in, chunk sizes between 1 MiB and 50 MiB, 50000 rows per cycle hard cap, all tables enabled.

ObjectTypical sizeHard cap / floorWrite opRead op
*.jsonl.gz chunk1–50 MBfloor MIN_CHUNK_BYTES (1 MiB), cap MAX_CHUNK_BYTES (50 MiB)PutObject per chunk per archive cycle (per (table, tenant) with eligible bytes ≥ floor)GetObject per chunk on rehydration (rare; only when reading archived rows)
*.manifest.json~1 KBPutObject per chunk (paired with the data write)GetObject per chunk on rehydration / catalog scan

Archive-cycle scheduling: the cold-store-archive scheduled job runs hourly (0 * * * * cron). Each cycle scans cold_store_chunks candidates per (table, tenant), archives up to MAX_ROWS_PER_CYCLE (50000) rows, splitting into chunks no smaller than MIN_CHUNK_BYTES (1 MiB) and no larger than MAX_CHUNK_BYTES (50 MiB). Tenants below MIN_WARM_TENANT_BYTES (5 MiB) are skipped.

Per-cycle cost (per (table, tenant) with N rows producing K chunks): K × PutObject (data) + K × PutObject (manifest) + 1× ListObjectsV2 to detect prior chunks + database mutations. Concurrency capped at KICI_COLD_STORE_S3_CONCURRENCY (default 4).

Rehydration cost (rare, on dashboard archive query or kici-admin rehydrate): ListObjectsV2 to find chunks in date range → GetObject for matching *.manifest.json files → GetObject for selected *.jsonl.gz chunks.

Purge schedule: cold-store-purge runs hourly (15 * * * * cron) and DeleteObjects chunks whose retention bucket has expired (30d etc).

Steady-state per-tenant write volume (rough): if a tenant produces ~10 MB/day of warm rows across all 6 tables, expect ~10 archive PutObject ops per day per table at the chunk floor, hourly cycles.

PhaseTriggerWhat happens
Createdcold-store-archive cycle (hourly) finds eligible warm rowsBuild chunk → gzip → PutObject data + PutObject manifest. Chunk metadata recorded in cold_store_chunks table.
Updated(immutable; never updated)Cold-store chunks are append-only. The bucket has S3 versioning enabled so any operator-error overwrite is recoverable.
Deletedcold-store-purge cycle (hourly) finds chunks past retentionDeleteObject data + DeleteObject manifest. Retention bucket (30d / 180d / 1y / 2y / forever) is set at write time based on table policy.

Source: packages/orchestrator/src/cold-store/orchestrator-cold-store.ts. See security audit log for the rehydration flow.

Two non-KiCI services deployed alongside the orchestrator in staging also write to S3, with their own buckets and credentials:

  • Loki (log aggregation): <deployment-slug>-loki
  • Mimir (metrics TSDB): <deployment-slug>-mimir

These are out of scope for this doc — they’re maintained by the platform deployment, not the orchestrator. See docs/internal/platform/storage-layout.md for the full inventory (internal docs only).