diff --git a/CLAUDE.md b/CLAUDE.md
index 4b47ff3..706f6d2 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -87,6 +87,15 @@ Stream back to client frontend (text + audio)
 
 companion-api orchestrates the pipeline. @ai owns all personality mechanics.
 
+### GPU / VRAM
+
+companion-api holds zero VRAM. All inference and TTS go through model-boss's priority queue:
+
+- **LLM inference** → `POST @model-boss /v1/chat/completions` — model-boss loads/evicts models via its pool
+- **TTS** → `POST @speech-synthesis /synthesize` → speech-synthesis delegates to `POST @model-boss /api/v1/tts/synthesize` (no raw VRAM lease held by either service)
+
+Never acquire GPU leases directly from companion code.
+
 ---
 
 ## Version Roadmap
diff --git a/companion-load.png b/companion-load.png
new file mode 100644
index 0000000..65bca17
Binary files /dev/null and b/companion-load.png differ