Environments
This guide covers the operational aspects of KiCI’s deployment environment system: database tables, API management, Vault integration, held run lifecycle, monitoring, and troubleshooting.
Database tables
Section titled “Database tables”Environments are stored in the orchestrator database. The squashed baseline migration 001_initial.ts creates the following tables:
| Table | Purpose |
|---|---|
environments | Environment definitions with protection rules |
environment_variables | Key-value pairs per environment (with lock flag) |
environment_source_overrides | Per-source variable overrides |
environment_bindings | Scope-to-environment secret bindings |
held_runs | Runs held by protection gates (pending approval) |
The execution_runs table also gains an environment column (TEXT, nullable) to track which environment each run targeted.
Key constraints
Section titled “Key constraints”environments(org_id, name)— unique environment name per orgenvironment_variables(environment_id, key)— unique variable key per environmentenvironment_source_overrides(environment_id, routing_key, key)— unique override per source+keyheld_runs(run_id, job_id)— one held entry per job
Environment management via API
Section titled “Environment management via API”Environments are managed through the dashboard proxy API. All CRUD operations route through Platform -> WebSocket -> Orchestrator.
Creating environments
Section titled “Creating environments”Environments can be created via the dashboard or seeded directly in the orchestrator database:
INSERT INTO environments (org_id, name, type, enabled)VALUES ('my-org', 'production', 'fixed', true);For glob-pattern environments:
INSERT INTO environments (org_id, name, type, glob_pattern, enabled)VALUES ('my-org', 'review/*', 'glob', 'review/*', true);Setting variables
Section titled “Setting variables”INSERT INTO environment_variables (org_id, environment_id, key, value, locked)VALUES ('my-org', '<env-id>', 'API_URL', 'https://api.example.com', false);The locked flag prevents source-level overrides from changing this variable. Locked variables can only be modified by environment admins.
Configuring protection rules
Section titled “Configuring protection rules”Protection rules are columns on the environments table:
UPDATE environments SET branch_restrictions = '["main", "release/*"]', required_reviewers = '["user-id-1", "user-id-2"]', wait_timer_seconds = 300, concurrency_limit = 1, concurrency_strategy = 'queue', hold_expiry_seconds = 86400WHERE org_id = 'my-org' AND name = 'production';Vault integration for secrets
Section titled “Vault integration for secrets”Secrets can be stored in PostgreSQL (default) or HashiCorp Vault. Backends are managed through the kici-admin backend CLI commands, which store configuration (encrypted) in the secret_backends database table — not through environment variables or YAML config.
# Register a Vault backendkici-admin backend add my-vault \ --type vault \ --vault-url https://vault.example.com \ --vault-auth-method token \ --vault-token hvs.xxx \ --vault-mount-path secret \ --vault-base-path kici/secrets
# List registered backendskici-admin backend list
# Remove a backendkici-admin backend remove my-vaultSee docs/operator/orchestrator/kici-admin-cli.md for the full backend subcommand reference.
When using Vault:
- Scope paths map directly to Vault KV v2 paths under
{mountPath}/data/{basePath}/ - The orchestrator reads secrets at dispatch time (not cached)
- Vault connection is operator-managed, not configurable per-scope in the dashboard
- PG-stored secrets and Vault-stored secrets can coexist (backend is per-scope)
Held run lifecycle
Section titled “Held run lifecycle”States
Section titled “States”| State | Description |
|---|---|
pending | Awaiting reviewer approval or timer expiry |
approved | Reviewer approved; job proceeds to dispatch |
rejected | Reviewer rejected; job is cancelled |
expired | Hold expiry timeout reached; job is cancelled |
Expiry and cleanup
Section titled “Expiry and cleanup”- Default hold expiry: 3600 seconds (1 hour), configurable per-environment via
hold_expiry_seconds - The stale run detector (Sub-scan E) periodically calls
heldRunStore.expireOverdue()to transition expired pending holds toexpiredstatus - Expired held runs result in the associated job being cancelled
Approval flow
Section titled “Approval flow”- Job targets an environment with
required_reviewers - Orchestrator creates a
held_runsentry with statuspending - Reviewer approves via dashboard or API (
POST /runs/:id/approve) - Held run transitions to
approved - Job is re-queued for dispatch
Monitoring
Section titled “Monitoring”Key metrics to watch
Section titled “Key metrics to watch”| Metric | Description | Alert threshold |
|---|---|---|
| Held runs pending | Count of held_runs WHERE status = 'pending' | > 10 (may indicate stale approvals) |
| Held runs expired | Rate of status = 'expired' transitions | Increasing trend |
| Environment var resolution time | Time to resolve vars in processor | > 100ms |
| Protection pipeline rejections | Rate of branch/concurrency rejections | Depends on workflow |
Useful queries
Section titled “Useful queries”Count pending held runs per environment:
SELECT e.name, COUNT(*) as pending_countFROM held_runs hrJOIN environments e ON e.id = hr.environment_idWHERE hr.status = 'pending'GROUP BY e.name;Recent protection rule rejections:
SELECT j.job_name, j.error_message, r.created_atFROM execution_jobs jJOIN execution_runs r ON r.run_id = j.run_idWHERE j.error_message LIKE '%branch%' OR j.error_message LIKE '%protection%'ORDER BY r.created_at DESCLIMIT 20;Runs per environment:
SELECT environment, status, COUNT(*) as countFROM execution_runsWHERE environment IS NOT NULLGROUP BY environment, statusORDER BY environment, status;Troubleshooting
Section titled “Troubleshooting”Job rejected unexpectedly
Section titled “Job rejected unexpectedly”Symptom: Job fails with “Branch ‘X’ not allowed for environment ‘Y’”
Diagnosis: Check the environment’s branch_restrictions column:
SELECT name, branch_restrictions FROM environments WHERE org_id = 'your-org';Fix: Update branch restrictions to include the required branch pattern, or remove restrictions entirely by setting branch_restrictions = '[]'.
Job held indefinitely
Section titled “Job held indefinitely”Symptom: Job stays in pending held state beyond the expected hold expiry.
Diagnosis: Check if the stale detector is running and if the hold has expired:
SELECT id, status, expires_at, created_atFROM held_runsWHERE status = 'pending' AND expires_at < NOW();Fix: Either approve/reject manually via the API, or verify the stale detector sub-scan E is operational. The stale detector runs heldRunStore.expireOverdue() on each scan cycle.
Environment variables not reaching agent
Section titled “Environment variables not reaching agent”Symptom: Step does not see expected environment variables.
Diagnosis:
- Verify the variable exists in
environment_variablesfor the correct environment - Check if the variable is being overridden by a higher-precedence layer (job
env, secrets) - For source overrides, verify the
routing_keymatches the source triggering the job - Check if the variable is
lockedand a source override exists (locked vars skip source overrides)
SELECT ev.key, ev.value, ev.lockedFROM environment_variables evJOIN environments e ON e.id = ev.environment_idWHERE e.org_id = 'your-org' AND e.name = 'your-env';Dynamic environment not matching
Section titled “Dynamic environment not matching”Symptom: Dynamic environment name (e.g., review/PR-123) doesn’t inherit glob pattern config.
Diagnosis: Check that a glob environment exists with a matching pattern:
SELECT name, glob_pattern FROM environmentsWHERE org_id = 'your-org' AND type = 'glob';The glob matching uses picomatch. Verify the pattern matches the dynamic name:
review/*matchesreview/PR-123(single segment)review/**matchesreview/PR-123andreview/deep/path
Concurrency queue stuck
Section titled “Concurrency queue stuck”Symptom: Jobs queue but never dispatch even when the environment has capacity.
Diagnosis: Check running job count for the concurrency group:
SELECT COUNT(*) as runningFROM execution_jobs jJOIN execution_runs r ON r.run_id = j.run_idWHERE j.status = 'running' AND r.environment = 'your-env';If the count is below the concurrency limit but jobs are still queued, check for stale running jobs that may have lost their agent connection. The stale detector should catch these, but verify it’s operational.