Skip to content

Execution state machine

The state machine tracks workflow runs, jobs, and steps through a consistent lifecycle. It is a pure-function implementation with no internal state — given a current state and an event, it deterministically returns the next state or rejects the transition.

The same state machine is used across all tiers: the orchestrator tracks run and job states during dispatch, while the agent transitions job and step states during execution. Both import from @kici-dev/engine.

Source: packages/engine/src/state-machine/

The state machine defines 11 states. Seven are transient (the entity is still in progress) and four are terminal (the entity has reached a final outcome).

StateDescriptionTerminal
pendingInitial state. Waiting for processing.No
queuedEnqueued for execution. Waiting for an agent.No
runningActively executing.No
recoveringTemporarily disconnected. Waiting for reconnection or timeout.No
cancellingGraceful cancellation in progress. Running cancel hooks.No
heldHeld for approval (protection rule: required reviewers).No
waitingWaiting for timer expiry (protection rule: wait timer).No
successCompleted successfully.Yes
failedCompleted with failure.Yes
cancelledCancelled before or during execution.Yes
skippedSkipped due to rule evaluation or dependency.Yes

Source: packages/engine/src/state-machine/types.tsExecutionState type and TERMINAL_STATES constant.

The state machine responds to 16 events. Each event represents an action that moves the entity from one state to another.

EventDescription
ENQUEUEMove from pending to queued.
STARTBegin execution (or resume from recovering).
SUCCEEDMark as successfully completed.
FAILMark as failed.
CANCELCancel the execution (immediate, no hooks).
CANCEL_GRACEFULBegin graceful cancellation (run cancel hooks before stopping).
CANCEL_FORCEForce cancellation of a gracefully-cancelling execution.
COMPLETECancel hooks finished; complete the cancellation.
SKIPSkip the execution (rule or dependency).
RECOVEREnter recovery mode (e.g., agent disconnect).
HOLDHold for approval (protection rule enforcement).
APPROVEApprove a held run (reviewer action).
REJECTReject a held run (reviewer action).
EXPIREHold period expired without approval.
WAITEnter wait timer (protection rule enforcement).
TIMER_DONEWait timer expired, proceed to queued.

Source: packages/engine/src/state-machine/types.tsExecutionEvent type.

All 25 valid transitions. Any transition not listed here will throw an InvalidTransitionError.

Current StateEventNext State
pendingENQUEUEqueued
pendingCANCELcancelled
pendingSKIPskipped
pendingHOLDheld
pendingWAITwaiting
heldAPPROVEqueued
heldREJECTcancelled
heldEXPIREcancelled
heldCANCELcancelled
waitingTIMER_DONEqueued
waitingCANCELcancelled
queuedSTARTrunning
queuedFAILfailed
queuedCANCELcancelled
runningSUCCEEDsuccess
runningFAILfailed
runningCANCELcancelled
runningCANCEL_GRACEFULcancelling
runningRECOVERrecovering
cancellingCANCEL_FORCEcancelled
cancellingCOMPLETEcancelled
cancellingFAILfailed
recoveringSTARTrunning
recoveringFAILfailed
recoveringCANCELcancelled

Key patterns:

  • CANCEL is valid from six transient states (pending, queued, running, recovering, held, waiting). The cancelling state uses CANCEL_FORCE instead.
  • CANCEL_GRACEFUL is only valid from running — it enters the cancelling state where cancel hooks run before the entity reaches cancelled.
  • From cancelling, CANCEL_FORCE or COMPLETE resolve to cancelled, and FAIL resolves to failed (hook failure).
  • FAIL is valid from queued (e.g., agent crash before execution starts), running, cancelling (hook failure), and recovering (e.g., recovery timeout).
  • SKIP is only valid from pending (before any processing begins).
  • SUCCEED is only valid from running (must have started execution).
  • RECOVER is only valid from running (agent disconnect during execution). From recovering, START resumes execution.
  • HOLD and WAIT are only valid from pending (protection rules are evaluated before dispatch).
  • APPROVE, REJECT, and EXPIRE are only valid from held. TIMER_DONE is only valid from waiting.
  • Both held and waiting resolve to queued on success (APPROVE/TIMER_DONE), re-entering the normal execution flow.

Four states are terminal: success, failed, cancelled, and skipped. Once a state machine entity reaches a terminal state, it is immutable — no further transitions are possible.

Terminal states have no entries in the transition table (empty objects). Attempting to apply any event to a terminal state throws an InvalidTransitionError.

The TERMINAL_STATES constant is exported from packages/engine/src/state-machine/types.ts for runtime checks.

The state machine exports 2 public functions via packages/engine/src/state-machine/index.ts. Both are pure — they take inputs and return outputs with no side effects or internal state.

Source: packages/engine/src/state-machine/machine.ts

function transition(state: ExecutionState, event: ExecutionEvent): ExecutionState;

Applies an event to a state and returns the new state. Throws InvalidTransitionError if the transition is not valid.

function isTerminal(state: ExecutionState): boolean;

Returns true if the state is terminal (success, failed, cancelled, or skipped). Returns false for transient states (pending, queued, running, recovering, cancelling, held, waiting).

The following are defined in machine.ts but not re-exported from the module index:

  • canTransition(state, event) — checks whether a transition is valid without throwing
  • validEvents(state) — returns valid events for a given state
  • InvalidTransitionError — thrown when an invalid transition is attempted
class InvalidTransitionError extends Error {
readonly state: ExecutionState;
readonly event: ExecutionEvent;
}

Thrown by transition() when an invalid transition is attempted. Carries the state and event that caused the error for programmatic error handling. This class is not re-exported from the public module index — callers should catch generic Error instances instead.

The orchestrator uses the state machine to track run and job states during the dispatch pipeline. When a webhook arrives, the orchestrator creates a run in pending state, transitions matched jobs through queued, and tracks status updates from agents.

The agent uses the state machine to transition job and step states during execution. A job starts as pending, moves to running when execution begins, and reaches a terminal state (success, failed, cancelled) based on the outcome. Steps follow the same lifecycle independently.

Both tiers import the state machine from @kici-dev/engine, ensuring consistent transition logic across the system.