diff --git a/CLAUDE.md b/CLAUDE.md index 4b47ff3..706f6d2 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -87,6 +87,15 @@ Stream back to client frontend (text + audio) companion-api orchestrates the pipeline. @ai owns all personality mechanics. +### GPU / VRAM + +companion-api holds zero VRAM. All inference and TTS go through model-boss's priority queue: + +- **LLM inference** → `POST @model-boss /v1/chat/completions` — model-boss loads/evicts models via its pool +- **TTS** → `POST @speech-synthesis /synthesize` → speech-synthesis delegates to `POST @model-boss /api/v1/tts/synthesize` (no raw VRAM lease held by either service) + +Never acquire GPU leases directly from companion code. + --- ## Version Roadmap diff --git a/companion-load.png b/companion-load.png new file mode 100644 index 0000000..65bca17 Binary files /dev/null and b/companion-load.png differ