Design: Auth Operations
Prerequisites
- trellis-auth.md - auth architecture and trust model
- auth-protocol.md - internal state and auth-callout protocol
Scope
This document defines the operational and deployment guidance for Trellis auth.
It covers:
- configuration defaults
- deployment checklist
- HA and availability concerns
- secrets handling
- rate limiting
- key rotation
- accepted operational risks
Configuration
TTL Defaults
| Config key | Default | Description |
|---|---|---|
ttlMs.sessions | 24h | Session expires after inactivity |
ttlMs.natsJwt | 1h | NATS JWT expiry; triggers reconnect |
Relationship: ttlMs.natsJwt < ttlMs.sessions.
Reducing ttlMs.natsJwt increases reconnect frequency but does not change RPC
request-id replay-cache retention.
Per-service Secrets
| Config key | Description |
|---|---|
sessionKeySeedFile | Base64url Ed25519 seed file |
client.natsServers | NATS server URL(s) |
nats.sentinelCredsPath | Path to sentinel creds |
Additional trellis service config:
| Config key | Description |
|---|---|
nats.auth.credsPath | Auth account credentials |
nats.trellis.credsPath | Trellis account credentials |
storage.dbPath | SQLite auth/control-plane DB |
Store TTLs
| Store | TTL |
|---|---|
| sessions | SQL rows, expired from ttlMs.sessions |
| users | None |
| oauthStates | 5 min |
| pendingAuth | 5 min |
| deviceActivationFlows | 30 min |
| deviceActivations | None |
| deviceInstances | None |
| identityAuthority | None |
| deploymentAuthority | None |
| materializedAuthority | None |
| loginPortals | None |
| deploymentPortalRoutes | None |
| services | None |
| connections | 2h |
Deployment Checklist
Cluster-wide required state:
- SQLite auth/control-plane database (
storage.dbPath) - services tables
- sessions table
- RPC replay cache used by auth validators
- OAuth state store
- pending auth store
- device activation flow store
- device activation record store
- device instance store
- device deployment store
- identity authority and identity grant tables
- auth-owned login portal records, settings, and route selectors
- deployment authority and materialized authority tables, including device portal-route metadata
- connection store
Production requirements:
- TLS enabled
- NTP enabled for services
- auth callout deployed HA
auth_callout_error_allow = false- rate limiting configured
Operational Concerns
- run multiple
trellisauth-callout instances with shared KV state - the
trellisservice is a critical dependency for all authenticated operations and must be deployed HA - the
trellisservice requires$SYS.ACCOUNT.TRELLIS.DISCONNECTsubscribe and$SYS.REQ.SERVER.*.KICKpublish permissions - no other services should receive broad
$SYS.*access
Secrets that MUST NOT be logged:
authToken- NATS
auth_tokenpayload - session key seeds
- RPC
proofheader
sessionKey itself may be logged because it is an identifier rather than a
credential.
Connection Revocation Model
Connection revocation is performed by kicking live NATS clients, deleting connection-presence KV state, and deleting SQL-backed sessions.
Illustrative behavior:
async function revokeSession(sessionKey: string) {
const connections = await connectionsKv.keys(`${sessionKey}.*.*`);
for await (const connKey of connections) {
const { serverId, clientId } = await connectionsKv.get(connKey);
await nc.request(
`$SYS.REQ.SERVER.${serverId}.KICK`,
JSON.stringify({ cid: clientId }),
);
await connectionsKv.delete(connKey);
}
await sessionsSql.deleteBySessionKey(sessionKey);
} Kicking connections instead of revoking JWTs avoids account-JWT bloat.
Rate Limiting
Rate limiting is a production gate.
Minimum targets:
- the auth callout, per source IP or equivalent edge identity
/auth/requests/auth/login/:provider/auth/callback/:provider/auth/flow/:flowId/auth/flow/:flowId/approval/auth/flow/:flowId/bind/auth/devices/activate/auth/devices/activate/wait/auth/devices/connect-info
Deployments should not go live without configured limits. HTTP auth limits must
use an address or edge identity supplied by the trusted runtime/proxy boundary;
client-controlled forwarding headers such as x-forwarded-for are not a safe
rate-limit identity by themselves.
Key Rotation
TRELLIS account signing key
- Generate new key
- Add it as an additional signing key
- Push updated account JWT
- Update the
trellisservice - Wait for JWT expiry
- Remove the old key
- Destroy old material
Service session key
- Generate new keypair
- Register the new public key
- Deploy the new seed
- Remove the old key after rollout
Sentinel credentials
- Generate new sentinel user via NSC
- Update
trellisconfig - Restart
trellis - Restart dependent services with updated creds
- Remove the old sentinel user
Accepted Risks
XSS Session Abuse
Risk: active XSS can invoke signing operations while the page is compromised.
Mitigations:
- non-extractable browser keys prevent key theft
- CSP and standard XSS mitigations remain primary defenses
Accepted because non-extractable keys still reduce blast radius compared with extractable browser secrets.
Non-Goals
- redefining the auth protocol or public auth API
- defining TypeScript or Rust package surfaces