Execution lifecycle
Overview
Section titled “Overview”This document describes the runtime lifecycle of a KiCI workflow execution, focusing on cancellation, lifecycle hooks, and concurrency groups. For the underlying state machine, see state-machine.md. For job execution details, see job-execution.md.
State machine: cancelling state
Section titled “State machine: cancelling state”The cancelling state is a transient state between running and cancelled. It represents the grace period during which the agent terminates the active step and runs lifecycle hooks.
pending -> queued -> running -> cancelling -> cancelled \ ^ \-> cancelled ----------| (direct, force cancel)Transitions involving cancelling
Section titled “Transitions involving cancelling”| From | Event | To | Description |
|---|---|---|---|
running | CANCEL | cancelled | Force cancel (immediate, no hooks) |
running | CANCEL_GRACEFUL | cancelling | Graceful cancel (hooks will run) |
cancelling | CANCEL_FORCE | cancelled | Force cancel escalation |
cancelling | COMPLETE | cancelled | Hooks finished normally |
cancelling | FAIL | failed | A hook failed during cancellation |
The cancelling state is NOT terminal — isTerminal('cancelling') returns false.
Cancel chain
Section titled “Cancel chain”The cancel chain propagates from the user interface down to the executing agent.
Graceful cancel flow
Section titled “Graceful cancel flow”Dashboard/CLI/API | | POST /api/v1/orgs/:customerId/runs/:runId/cancel { force: false } vDashboard's API endpoint | | run.cancel.request (WebSocket) vOrchestrator | | Sets run status to 'cancelling' | Sends job.cancel { force: false } to agent vAgent | 1. SIGTERM to running step process 2. Wait grace period (default: 30s, configurable per-job) 3. SIGKILL if step hasn't exited 4. Run onCancel hook (if defined) 5. Run cleanup hook (if defined) 6. Report job.status = 'cancelled' vOrchestrator | | Transitions run from 'cancelling' to 'cancelled' vDoneForce cancel flow
Section titled “Force cancel flow”Dashboard/CLI/API | | POST /api/v1/orgs/:customerId/runs/:runId/cancel { force: true } vDashboard API -> Orchestrator | | Sets run status to 'cancelled' (immediate) | Sends job.cancel { force: true } to agent vAgent | 1. SIGKILL to running step process (immediate) 2. Skip all hooks (onCancel, cleanup) 3. Report job.status = 'cancelled' vDoneTwo-level cancel UX
Section titled “Two-level cancel UX”The dashboard and CLI implement a two-level cancel pattern:
- First cancel request — graceful. The run transitions to
cancelling(amber badge). The cancel button changes to “Force cancel” (red). - Second cancel request — force. The run transitions immediately to
cancelled. All hooks are skipped.
In the CLI: first Ctrl+C sends graceful cancel, second Ctrl+C sends force cancel.
Hook execution order
Section titled “Hook execution order”Hooks execute inside-out, like stack unwinding.
On cancellation
Section titled “On cancellation”1. Step-level hooks (on the cancelled step): - onCancel (if step defines one) - cleanup (if step defines one)
2. Job-level hooks: - onCancel - cleanup (always runs)On success
Section titled “On success”1. Step-level hooks: - afterStep (runs after each step, before next step starts)
2. Job-level hooks: - onSuccess - cleanup (always runs)On failure
Section titled “On failure”1. Job-level hooks: - onFailure - cleanup (always runs)Key principles
Section titled “Key principles”- Hooks are observers — they cannot change execution flow. One mechanism per concern: rules for conditional logic, hooks for lifecycle callbacks.
- Hooks run sequentially after the step exits, not in parallel with the step.
- cleanup always runs regardless of outcome (success, failure, or graceful cancel), but is skipped on force cancel.
- afterStep runs immediately after its step, before the next step starts (not deferred).
- Hook failure changes job status to
failedwith a compound reason (e.g., “cancelled (onCancel hook failed: timeout)”).
Hook step protocol
Section titled “Hook step protocol”Each hook execution is reported as a separate step in the protocol, with a step_type field distinguishing it from regular steps.
// Agent sends step.status for hook execution{ type: 'step.status', runId: 'run-001', jobId: 'deploy', stepIndex: 2, // incremented from the last regular step stepName: 'onCancel', state: 'running', // or 'success', 'failed' step_type: 'hook:onCancel', // hook type identifier}Valid step_type values:
step(default, regular step)hook:onCancelhook:cleanuphook:onSuccesshook:onFailurehook:beforeStephook:afterStep
Hook steps appear in the dashboard with a distinct visual marker (hook icon) and lighter styling. Each hook gets its own execution_steps row with separate status, timing, and log stream.
Concurrency group protocol
Section titled “Concurrency group protocol”Concurrency groups prevent parallel execution of related workflow runs. The evaluation happens agent-side (the group key function needs runtime context), with the orchestrator making the concurrency decision.
Protocol flow
Section titled “Protocol flow”Agent (evaluates group function) | | job.concurrency.report { group: 'deploy-main', runId, jobId } vOrchestrator | | Checks in-progress runs with same group key | Decides: proceed, wait, or cancel | | job.concurrency.ack { action: 'proceed' | 'wait' | 'cancel', reason? } vAgent | | proceed: continue execution | wait: release agent back to pool (queued state) | cancel: report job cancelled with superseded reasoncancelInProgress mode
Section titled “cancelInProgress mode”When cancelInProgress: true, the orchestrator cancels older runs in the same group:
- New run joins group
- Orchestrator finds older running run with same group key
- Older run receives
job.cancelwith reason “Superseded by run #N” - New run receives
job.concurrency.ack { action: 'proceed' }
Queue mode
Section titled “Queue mode”When cancelInProgress: false, the orchestrator holds the new run:
- New run joins group
- Orchestrator finds active run with same group key
- New run receives
job.concurrency.ack { action: 'wait', reason: 'Waiting for deploy-main (1 ahead)' } - Agent releases back to pool
- When prior run completes, orchestrator dispatches the queued run
Timeouts
Section titled “Timeouts”Concurrency group evaluation has a configurable timeout (default: 30s). If the agent doesn’t report the group key within the timeout, the job fails.
Grace period and hook timeout
Section titled “Grace period and hook timeout”The total time a cancel can take is bounded by:
total_cancel_time = gracePeriod + hookTimeout- gracePeriod: seconds between SIGTERM and SIGKILL. Configured per-job in the SDK (
gracePeriod: 60), with an operator-configurable maximum. Default: 30s. - hookTimeout: maximum time for all hooks to complete. Default: 5 minutes. Configurable per-hook.
Both are enforced by the agent. The orchestrator monitors for stuck jobs via stale detection.
Cancelled dependent jobs
Section titled “Cancelled dependent jobs”When a run is cancelled, pending/queued dependent jobs (jobs with needs) are marked cancelled (not skipped). This distinguishes “rule-skipped” from “parent-cancelled” in the UI and reporting.
Source: packages/engine/src/state-machine/, packages/engine/src/protocol/messages/orchestrator-agent.ts