Guides | Trellis documentation

This guide covers querying, cancelling, replaying, and monitoring jobs across all services using the built-in trellis.jobs@v1 API. The examples use the orders-service fulfillment jobs from Jobs: TypeScript.

Note for Service Authors: Jobs are strict service-internal execution machinery. They run in the background and are completely invisible to callers. For caller-visible asynchronous workflows, you should implement Operations instead.

This guide is for operators. If you are a service author, see Jobs: TypeScript or Jobs: Rust.

What the Jobs admin runtime implements

trellis.jobs@v1 is a standard built-in Trellis API. The optional Jobs admin runtime is Trellis infrastructure that implements its admin behavior:

SQL projection — a queryable read-optimized view of current job state derived from the JetStream source of truth
Global RPCs — Jobs.List, Jobs.Get, Jobs.GetKey, Jobs.Cancel, Jobs.Retry, Jobs.ReplayDLQ, Jobs.DismissDLQ, Jobs.ListServices, Jobs.Health
Janitor — background cleanup of expired jobs and stale projection entries

The Jobs admin runtime is stateless and horizontally scalable. It does not participate in job correctness — it is a derived observability layer. The JetStream stream remains the source of truth.

Querying jobs

Use the admin client to query jobs. The client is available from a Trellis CLI tool or any service/app that declares the trellis.jobs@v1 contract in its uses.

List all pending fulfillment jobs for the orders service:

const pending = await client.rpc.jobs.list({
  service: "orders-service",
  type: "reserve-inventory",
  state: ["pending"],
  limit: 50,
});

for (const job of pending.entries) {
  console.log(job.id, job.state);
}

Get a specific job:

const job = await client.rpc.jobs.get({
  id: "job_abc123",
});

console.log(job.context.requestId, job.context.traceId);

JobFilter fields are all optional — omit any to widen the query. Projected job records include context.requestId, context.traceId, context.traceparent, and optional context.tracestate so operators can pivot from a job to related RPC, event, operation, and service logs.

Inspecting keyed concurrency

Jobs that declare keyConcurrency expose their derived key in admin projection. Use Jobs.GetKey to inspect why work is waiting or rejected for a logical key:

const key = await client.rpc.jobs.getKey({
  service: "zendesk-sync",
  type: "sync-tickets",
  key: "zendesk:foo:tickets",
});

console.log(key.active, key.queued, key.heartbeatAgeMs);

Key status includes the active job identity, queued depth, heartbeat age, lease expiry, stale takeover count, and the latest queue-policy reason such as active-limit, queue-depth, coalesced, or replaced. A stale active job is reported explicitly; it is not inferred from worker-presence heartbeats.

Cancelling a job

Jobs.Cancel is valid for jobs in pending, active, or retry states.

await client.rpc.jobs.cancel({
  id: "job_abc123",
});

Important: Cancellation is cooperative and eventual for active jobs. The service worker continues running until it checks job.cancelled and responds. Expect a short delay before the state transitions to cancelled. Jobs in pending or retry state transition immediately.

Failed vs dead: two different operations

The jobs system distinguishes failed from dead. They require different admin actions.

`failed` jobs

A job enters the failed state when it has exhausted its automatic retries. Use Jobs.Retry to re-enqueue it:

await client.rpc.jobs.retry({
  id: "job_abc123",
});

This transitions the job back to pending for re-processing.

`dead` jobs

A job enters the dead state when it exhausts its maxDeliver retry attempts. Example: a charge-payment job failed 5 consecutive times because the payment provider was down.

The dead state is not terminal — dead jobs can be either replayed or dismissed.

Replay (when the underlying issue is fixed):

await client.rpc.jobs.replayDLQ({
  id: "job_abc123",
});

This re-enqueues the job at pending.

Dismiss (when the work is no longer needed):

await client.rpc.jobs.dismissDLQ({
  id: "job_abc123",
});

This transitions the job to dismissed, a terminal state. Dismissed jobs are not re-processed.

Worker presence

Services emit heartbeat messages per job type while workers are running. Use these to confirm workers are active and see what they are processing.

The heartbeat data includes the service name, job type, instance identifier, and a timestamp. Absence of recent heartbeats for a job type means either the service is not running or the worker is not registered for that job type.

Use the trellis CLI or an observability tool subscribed to the heartbeat subjects to monitor worker presence:

trellis jobs workers --service orders-service

Janitor behavior

The janitor is a background loop in the Jobs admin runtime that cleans up:

Expired jobs — jobs that exceeded their defaultDeadlineMs and have not completed; moved to expired
Stale SQL projection entries — projection records whose underlying stream messages have been cleaned up

The janitor interval is operator-configurable via the Jobs admin runtime configuration. There is no fixed default — tune it based on your job volume and retention requirements. A shorter interval keeps the projection tighter; a longer interval reduces janitor overhead on high-volume deployments.