route(signals, fleet) -> RouteDecision via a deterministic cascade:
explicit host > capability-pin (uses hosts_with_capability) > sticky
(subject's session/task already runs on a host, via sessions+assignments)
> default-local. Pure + auditable (reason+candidates surfaced); the LLM
classify step and cross-host execution are separate layers. 13 tests.
Part of task 13764f2f.
(manual commit via ALLOW_COMMIT — autocommit LLM still down on claire)
When a local worker pane dies (crash, OOM, host power-cycle), its JSONL persists
and is resumable. The agent supervisor now detects dead-but-recent local
sessions and `claude --resume <uuid>`s them, then sends a re-orient kick so the
session re-determines its OWN state (done vs pending vs finished) before acting
— mirrors the orchestrator's rehydrate-on-startup.
- rclaude.Rclaude.resume(): spawn `claude --resume <uuid>` via RCLAUDE_RESUME_ID
(verified empirically against a real dead session on apricot).
- supervisor.select_resume_candidates(): pure, guarded selection — recency
window, supersession (skip if a LIVE session shares the cwd), orchestrator-
workspace exclusion, per-session retry cap, per-tick global ceiling (the
first-wake token-storm guard). 7 unit tests.
- AgentConfig.auto_resume off|dry-run|on (default off) + max/per_tick/window.
Ships off; roll out via dry-run, then on — same pattern as auto_continue.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
known_hosts gains a `capabilities` tag list (e.g. media, transmission,
cores:64, gpu) + ClaireConfig.hosts_with_capability(tag) (exact or key:
prefix match) and capabilities_for(host) (alias-resolved). Lets routing
(location-transparent Claire, task 13764f2f) and dispatch pick a host by
what it CAN do, not just load. Seeded black={media,transmission}.
Prereq task a5453fb8. 351 tests green.
(manual commit via ALLOW_COMMIT — autocommit LLM still timing out on claire)
Wire the rounds timer to a pure-Python skip gate so claire-serve only wakes
the orchestrator model when worker fleet state changed (not every tick):
- web/rounds.py: fleet_fingerprint() over worker sessions (minus the
orchestrator's own) + open tasks; should_skip_round() with heartbeat floor.
- web/app.py: _rounds_loop tracks last fingerprint + consecutive skips.
- excludes the orchestrator's own session/chat so a round's self-side-effects
can't defeat the gate.
Add scripts/release-fleet.sh (test -> deploy apricot+black -> restart plum
services) and harden deploy-agent.sh's cosmetic status check against a SIGPIPE
false-abort. 3 new discriminating tests; 349 pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
discover_session polled `rclaude list sessions` for the freshly spawned
session but filtered rows with `r.host == host` where host is the
canonical name (e.g. "plum"), while rclaude labels the calling machine's
own sessions "local". "local" == "plum" is always False, so discovery
matched nothing and timed out even though the session's JSONL was already
on disk (observed: 18s after spawn, inside the 30s window). dispatch then
falsely returned "spawned but not discovered", orphaning the live session
until a manual pull.
Root cause is a missing host-label normalization the pull loop already
does. Fix discover_session to canonicalize both sides via
resolve_host_label, and key local-path symlink resolution on the ROW's
raw label. Apply the same normalization to dispatch_task's pre_uuids
filter (identical mismatch left it empty, risking a stale-sibling match
at a shared cwd). 2 regression tests reproduce rclaude's "local" labeling
(the old fake echoed the dispatch host, masking the bug). 310 tests pass.
Committed manually with ALLOW_COMMIT=1 per user authorization: the
auto-commit service's message LLM was timing out on this repo.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ChatRole.CLAIRE now persists as "claire" everywhere. Migration 0010
rewrites both the chat_messages.role column and the chat_message_posted
event payloads in one transaction so a future replay reconstructs the
same projection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>