Design: Service Development
Prerequisites
- trellis-patterns.md - Trellis architecture and communication model
- type-system-patterns.md - schema and Result conventions
- ../operations/trellis-operations.md - caller-visible async workflows
- ../jobs/trellis-jobs.md - service-private jobs
Design
Trellis services share a common development shape: a small bootstrap entrypoint, explicit contract ownership, and a clean separation between caller-visible operations and service-private jobs.
Before choosing a file layout, choose the participant kind and runtime helper.
Participant kind and runtime helper
Repo folder names are local organization only. They do not determine Trellis
contract kind, install behavior, or which connect helper is correct.
| Contract kind | Normal helper | Use when |
|---|---|---|
service | TrellisService.connect(...) | The participant owns installable RPCs, operations, events, or service-owned resources and runs as a deployment service principal |
device | TrellisDevice.connect(...) | The participant authenticates through device activation using a preregistered device root secret |
app, agent | TrellisClient.connect(...) | The participant is a user-facing app, CLI, native app, or delegated tool rather than an installed service |
Rules:
- choose
kindfrom the participant’s identity and auth flow, not from the repo folder that contains the code - code under
services/may still correctly bekind: "device"when it is a repo-local demo, simulator, or utility that authenticates as a device principal - a participant with no owned RPCs, operations, events, or resources is normal; do not invent owned APIs just to fit a service template
- only
kind: "service"participants should useTrellisService.connect(...), service deployment flows, and service-owned runtime handles such asservice.kv,service.store, andservice.jobs - resolved service resource bindings are runtime internals; service authors use
the handles returned by
TrellisService.connect(...)rather than fetching, constructing, or passing binding payloads themselves
Directory structure
services/<name>/
├── main.ts # Bootstrap, handlers, shutdown
├── contract.ts # Local contract definition
├── contracts/ # Optional contract module directory
├── config.ts # Environment configuration
├── globals.ts # Shared runtime state
├── deno.json # Tasks, imports
└── <domain>.ts # Business logic The full template above is common for installable services. Smaller repo-local
participants such as demos or utilities may only need main.ts, deno.json,
and one contract module.
For TypeScript service contract source files, use a top-level contract.ts for
single-contract services and contracts/*.ts only when the service owns
multiple contract modules. In either layout, the contract module should default
export the defineServiceContract(...) result so prepare and generation can
resolve it directly.
Lifecycle
For kind: "service" participants:
import { TrellisService } from "@qlever-llc/trellis/service/deno";
import { myService } from "./contract.ts";
const service = await TrellisService.connect({
trellisUrl: config.trellisUrl,
contract: myService,
name: "<name>",
sessionKeySeed: config.sessionKeySeed,
server: {},
});
const itemsKV = service.kv.items;
const uploadsStore = (await service.store.uploads.open()).take();
const stagedUpload = (await uploadsStore.waitFor("incoming/report.pdf", {
timeoutMs: 10_000,
})).take();
await service.handle.rpc.some.method(handler);
await service.event.some.event.listen(eventHandler, {});
const catalog = await service.rpc.trellis.catalog({});
if (catalog.isErr()) {
throw catalog.error;
}
const shutdown = async () => {
try {
await service.stop();
} finally {
Deno.removeSignalListener("SIGTERM", shutdown);
}
};
Deno.addSignalListener("SIGTERM", shutdown); Rules:
- service code MUST bootstrap through
TrellisService.connect(...); do not import the core SDK to recreate service bootstrap or callTrellis.Bindings.Getfrom application code - service code MUST NOT construct
TrellisService,StoreHandle, or resource handles directly, and MUST NOT pass resolved binding or resource data intoTrellisconstructors - service shutdown handlers SHOULD release runtime resources, remove registered
signal listeners, and let successful shutdown terminate naturally so
deno run --watchcan restart the program instead of exiting the watcher - failed or timed-out shutdown paths MAY call
Deno.exit(1)after logging the failure - if a service also owns an HTTP listener, its shutdown path SHOULD bound the wait for listener drain before exiting rather than waiting indefinitely on long-lived keep-alive or streaming connections
Application dependency binding
Services MAY bind application-owned dependencies once with service.with(deps) and register handlers through the returned wrapper:
const app = service.with({ db, logger });
await app.handle.rpc.entity.list(async ({ input, context, client, deps }) => {
deps.logger.info({ caller: context.caller }, "listing entities");
return Result.ok(await listEntities(deps.db, input));
}); Bound dependencies are passed as args.deps in service-owned handler contexts,
including RPC, feed, operation, job, event listener, and health check/info
handlers. Trellis runtime context remains separate from application
dependencies: do not merge app dependencies into context, and do not pass
dependency bags as handler registration options.
Rules:
service.with(deps)is optional; unbound service handler registration remains valid- Trellis passes the dependency object through but does not own its lifecycle, clone it, initialize it, or dispose it
- multiple wrappers created from the same service keep independent dependency bindings
- the second and third arguments to registration methods remain surface-specific Trellis options such as event subject data and event listener options, not app dependency slots
- registration settings such as handler
timeoutMs, registration-level cancellation,onError, middleware, or custom validation are deferred until the runtime has clear enforcement and interception semantics; do not expose fake settings that have no runtime behavior
Service-local storage
Most services should keep durable domain storage behind their own service boundary and expose behavior through contract-owned RPCs, operations, events, and resource declarations. The Trellis control-plane service uses local SQLite for its own durable runtime records.
Rules:
- service-local storage is an implementation detail unless the contract exposes a public API over it
- the Trellis control-plane SQLite database defaults to
/var/lib/trellis/trellis.sqliteand is configurable asstorage.dbPath - Trellis service bootstrap owns opening the database, creating the schema, and constructing concrete storage modules
- prefer concrete storage modules for the service’s actual record types rather than generic repository abstractions
- app-generated ULID row primary keys are used for SQL table identity; public and domain identifiers remain separate columns
- when a service uses an outbox to couple event publication to local durable state, commit the local state and outbox row in the same transaction, then signal any process-local dispatcher only after the transaction commits
- outbox dispatcher wakeups should be debounced and single-flight, but they are not the source of durability; services should retain explicit dispatch or recovery scans for missed signals and restarts
Minimal installable service example
import { defineServiceContract, Result } from "@qlever-llc/trellis";
import type { RpcArgs, RpcResult } from "@qlever-llc/trellis";
import { TrellisService } from "@qlever-llc/trellis/service/deno";
import {
HealthResponseSchema,
HealthRpcSchema,
} from "@qlever-llc/trellis/health";
const schemas = {
HealthRequest: HealthRpcSchema,
HealthResponse: HealthResponseSchema,
} as const;
export const serviceContract = defineServiceContract(
{ schemas },
(ref) => ({
id: "acme.echo@v1",
displayName: "Echo Service",
description: "A minimal installable Trellis service example.",
rpc: {
"Echo.Health": {
version: "v1",
input: ref.schema("HealthRequest"),
output: ref.schema("HealthResponse"),
capabilities: { call: [] },
errors: [ref.error("UnexpectedError")],
},
},
}),
);
export default serviceContract;
type Args = RpcArgs<typeof serviceContract, "Echo.Health">;
type Return = RpcResult<typeof serviceContract, "Echo.Health">;
const service = await TrellisService.connect({
trellisUrl,
contract: serviceContract,
name: "echo",
sessionKeySeed,
server: {},
});
export async function health({ client }: Args): Promise<Return> {
return Result.ok({
status: "healthy",
service: "echo",
timestamp: new Date().toISOString(),
checks: [],
});
}
await service.handle.rpc.echo.health(health); Rules:
- a minimal installable service should own at least one public surface such as an RPC, operation, or event rather than existing only to call other services
- installable service code uses
TrellisService.connect(...)and mounts only names from its owned contract surface - service resource handles come from the connected runtime; do not call
Trellis.Bindings.Getor manually construct service, KV, store, or jobs handles in service-author code - the optional
serverblock configures service-runtime concerns such as logging, default request timeout, event-consumer stream selection, no-responder retry behavior, and extra health checks server.logdefaults to the package server logger; set it tofalseto disable runtime logging or provide a pino-compatible logger to use your own- service runtime NATS lifecycle logging is explicit rather than generic; disconnect, reconnect attempts, reconnect success, stale connections, and connection errors should each log a distinct message so operators can tell whether the service is recovering or stuck
- when the connected contract uses the shared
Health.Heartbeatevent,TrellisService.connect(...)publishes baseline heartbeats automatically and service code may enrich them throughservice.health.setInfo(...)andservice.health.add(...) - health heartbeats refresh freshness and operator-visible status only for the already accepted matching implementation offer; they must not create offers or change the offered digest
- graceful
service.stop()marks the accepted offer stale for the same short grace window used after unplanned disconnects - mounted RPC handlers should rely on Trellis-provided payload typing and validation rather than re-parsing the mounted payload just to recover types
- extracted service RPC handler aliases should come from
@qlever-llc/trellis/serviceso handlers use the canonical object argument shape and receive the narrow injectedtrellisservice runtime facade rather than the fullTrellisService - mounted RPC handlers may be synchronous when they do not need
await - mounted RPC handlers may return declared local
TrellisErrorsubclasses directly when those errors are listed in the contract RPCerrors: [...] - service-local transportable RPC errors should be declared in the contract’s
top-level
errorsmap throughdefineError(...)generated classes rather than by overloading shared built-in errors for domain-specific failures - if the service later needs remote APIs, add SDK
use(...)helper results underuses.requiredoruses.optional; aliases directly underusesare invalid, and services must not hand-write remote contract ids or raw method strings - if the service needs durable event processing, declare an explicit
eventConsumersgroup. UseeventConsumers.<group>.usesto select subscribed dependency events by top-leveluses.requiredoruses.optionalalias, and useeventConsumers.<group>.selfto select events owned by the same contract. A bareuses.events.subscribegrant authorizes live/ephemeral listening only; it does not create a durable cursor, but dependency durable consumption remains authority-backed by the top-levelusesdeclaration.
Behavior:
TrellisService.connect(...)performs bootstrap, auth handshake, contract verification, runtime connection setup, and eager binding resolution- if Trellis does not know the requested digest, service bootstrap asks the
runtime for the full manifest; the runtime retries with the canonical contract
emitted by
defineServiceContract(...)or the generated SDK module - service bootstrap validates and analyzes the presented manifest as a contract
proposal; invalid manifests fail immediately, while unknown required
usesdependencies produce targeted dependency blockers unless deployment authority supplies an accepted dependency shape. Bootstrap does not derive authority from historical manifests. - optional
usesdependencies that are missing or whose requested surfaces are missing do not fail bootstrap planning and do not grant runtime authority; when they later resolve as active, they require an authority update or authority migration before a fresh reconnect receives that authority - Trellis derives requested needs from the contract proposal and compares them to deployment authority desired state
- if desired authority is missing, bootstrap records an authority update or authority migration proposal for the delta and asks the service runtime to wait and retry until an admin accepts or rejects the proposal
- service-originated pending authority proposals are durable and deduplicated by the requested boundary so repeated starts with the same missing boundary coalesce into one pending authority update or migration
- if the service presents a different digest for the same
contractIdas the deployment’s latest accepted digest or offer, Trellis validates same-lineage compatibility. Incompatible replacement is an authority migration. Instrictmode, bootstrap records a pending migration plan and asks the service runtime to wait and retry until an admin accepts or rejects it. Inmutable-devmode, Trellis records and auto-accepts the same migration plan for unreleased iteration, then continues through normal desired-state and materialization checks. - compatibility mode controls whether an incompatible same-contract migration requires manual approval or is auto-approved for development; it does not make contract history an authority source
- once deployment authority desired state covers the requested needs, bootstrap
verifies that required
usesdependencies resolve against effective active contracts or accepted dependency shapes. If a required dependency has neither, bootstrap returns a dependency-not-active blocker and the runtime waits and retries. - if desired authority exists but materialization is incomplete, bootstrap returns reconciliation pending and the runtime waits and retries; bootstrap never provisions resources
- if a service presents a contract that no longer fits enabled deployment
authority, bootstrap returns
contract_changedrather than refreshing an old offer or issuing credentials for stale authority - after the dependency closure is active or accepted and all required materialized resource bindings are present, bootstrap accepts or refreshes the implementation offer, persists instance runtime state, and returns transport and binding details to the service runtime
- all declared
resources.kv,resources.store, top-leveljobs, and top-leveleventConsumersbindings are materialized authority resources. A service must not become ready with a silently skipped declared resource;required: falseonly makes the generated service handle optional. - schema-backed KV handles such as
service.kv.<alias>resolve during bootstrap as direct typed stores, while store handles such asservice.store.<alias>are opened explicitly before use - transfer-capable operations receive runtime-owned transfer contexts while
service code continues to access staged files through
service.store.* - when a contract declares top-level
jobs,TrellisService.connect(...)resolves a typedservice.jobsfacade for job creation, handler registration, and worker startup - when a contract declares
eventConsumers,TrellisService.connect(...)receives the reconciled event-consumer bindings during bootstrap. Register listeners during startup throughservice.event.<group>.<leaf>.listen(..., { group }); handler-injected clients are outbound-only and cannot register long-lived listeners. Service code must not choose or create a JetStreamdurableNamefor contract event processing; runtime durable consumers are Trellis-provisioned only. - grouped durable event consumers start only after every event in the group has a registered handler, preserving the contract-declared group as the unit of ordering and replay.
- the shared jobs streams are Trellis-owned infrastructure; reconciliation creates or adopts all declared job bindings before jobs-enabled services become ready. Bootstrap resolves those materialized bindings. Jobs admin projections are internal to the Jobs admin runtime.
- the latest presented contract is not the ongoing source of truth for already accepted resources; deployment authority owns desired state until an authority update or authority migration changes it
- when an RPC needs to start caller-visible follow-up work after a transfer, prefer a transfer-capable operation over an RPC-started workflow
- the
trelliscontrol-plane service is the one bootstrap exception and may use Trellis-internal bootstrap paths; that exception is not part of the public service-author surface
Jobs and operations
Use operations for caller-visible asynchronous workflows and jobs for service-private execution.
Behavior:
- if a user or peer service needs to observe async work, expose an operation from the owning service contract
- if work is only an internal execution detail, use a job and keep it behind the service boundary
- operation APIs should expose
OperationRef-style handles withget(),wait(), and optionalwatch() - service-local jobs APIs should expose per-job-type handles with
create()returningJobRef, synchronous handler registration throughservice.jobs.<queue>.handle(...), and service-owned worker lifecycle throughservice.wait()/service.stop() - public APIs must not expose weak raw wire types except in explicit raw/debug/admin surfaces
- public service APIs should hang off connected runtime objects such as
service.jobs,service.operation.<group>.<leaf>, andservice.handle.operation.<group>.<leaf>
Files and transfer
Services should treat Files as the public interface to service-owned store resources.
Behavior:
- metadata and control actions such as list/head/delete remain ordinary contract-owned RPCs
- byte transfer belongs on transfer-capable operations rather than separate initiation RPCs
- raw byte movement is executed through Trellis runtime helpers rather than hand-written service-specific chunk protocols
- service code uses
service.store.<alias>plus operation transfer contexts to back those public file APIs
Example:
const op = await billing.operation.billing.refund.start(input);
const done = await op.wait();
const job = await service.jobs.refundCharge.create({
operationId: op.id,
...payload,
});
return op.defer(); The job handler resumes the caller-visible operation through the operation-scoped service control helper. It must not reach into private runtime fields.
service.jobs.refundCharge.handle(async ({ job }) => {
const op = await service.handle.operation.billing.refund
.control(job.payload.operationId)
.orThrow();
await op.progress({ step: "capturing", message: "Capturing refund" })
.orThrow();
await op.complete({ refundId: "rf_123" }).orThrow();
return Result.ok({ completed: true });
});