From 7fc8fe80e03e227487103ee812f90127aba3d112 Mon Sep 17 00:00:00 2001 From: Claude Code Date: Sat, 28 Mar 2026 04:11:55 -0700 Subject: [PATCH] =?UTF-8?q?docs(docs):=20=F0=9F=93=9D=20Clarify=20CLAUDE.m?= =?UTF-8?q?d=20metadata=20fields=20and=20add=20required=20project=20detail?= =?UTF-8?q?s?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Lilith Autocommit --- .godot.pid | 2 +- .tray.pid | 2 +- CLAUDE.md | 211 +++++++++++++++++++++++++++++++++++------------------ 3 files changed, 141 insertions(+), 74 deletions(-) diff --git a/.godot.pid b/.godot.pid index 57bd68e..18f8589 100644 --- a/.godot.pid +++ b/.godot.pid @@ -1 +1 @@ -2513251 +590621 diff --git a/.tray.pid b/.tray.pid index 5c0eed7..82ecdc8 100644 --- a/.tray.pid +++ b/.tray.pid @@ -1 +1 @@ -2513252 +590622 diff --git a/CLAUDE.md b/CLAUDE.md index 1d3983f..2e6293d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -6,102 +6,169 @@ Interactive AI companion — Godot 4 desktop app with 3D VRM avatar, voice inter ``` @chobit/ -├── godot/ # Godot 4 project — the companion app -│ ├── project.godot # Godot project config -│ ├── scenes/ # Scene tree (.tscn files) -│ │ ├── companion.tscn # Main scene — transparent window + avatar -│ │ ├── avatar.tscn # VRM model + animation tree -│ │ └── ui/ # Chat bubble, mic indicator, settings -│ ├── scripts/ # GDScript logic -│ │ ├── companion.gd # Main orchestrator — conversation loop -│ │ ├── avatar/ # Avatar controller, expression, lipsync -│ │ ├── voice/ # Microphone input, VAD, audio playback -│ │ └── backend/ # HTTP/WebSocket clients for STT, TTS, LLM -│ ├── models/ # VRM model files (.vrm) -│ ├── audio/ # Audio resources -│ └── addons/ # Godot addons (VRM4Godot, etc.) +├── godot/ # Godot 4.6 project (the companion app) +│ ├── project.godot # Autoloads, main scene, window config +│ ├── addons/ # VRM4Godot, Godot-MToon-Shader +│ ├── audio/ # Audio assets (startup sound, etc.) +│ ├── config/ # Runtime config (auto-generated, gitignored) +│ ├── models/ # VRM model files (.vrm, gitignored) +│ ├── scenes/ +│ │ └── companion.tscn # Main scene — transparent window + avatar +│ ├── scripts/ +│ │ ├── audio/ # sound_engine.gd, sound_config.gd +│ │ ├── autoloads/ # event_bus.gd, companion_config.gd, flight_recorder.gd +│ │ ├── avatar/ # animation_state_machine, expression_controller, gaze_controller, +│ │ │ # idle_animator, lipsync_controller, attention_reactor +│ │ ├── backend/ # llm_client.gd, stt_client.gd, tts_client.gd +│ │ ├── companion/ # companion.gd (main), conversation_orchestrator.gd, +│ │ │ # tray_listener.gd, avatar_hitbox.gd, avatar_rotate.gd +│ │ ├── ui/ # chat_window.gd, context_menu.gd, sound_settings_window.gd +│ │ ├── util/ # node_utils.gd, config_paths.gd, screen_cursor.gd +│ │ ├── voice/ # microphone.gd +│ │ └── window/ # window_drag.gd, window_zoom.gd, edge_snap.gd +│ └── tools/ # Editor helper scripts (list_animations, list_blendshapes, +│ # screenshot.gd, zoom_test.gd) +│ +├── bridge/ # Python sidecar — Redis ↔ Godot UDP bridge +│ └── chobit_bridge.py # Forwards lilith-eventbus events into Godot via UDP (port 19700/19701) +│ +├── tray/ # Python sidecar — system tray UI + subprocess manager +│ ├── chobit_tray.py # TrayApp: spawns bridge + vision at startup, listens on port 19701 +│ ├── chobit_board.py # Dashboard UI panel +│ ├── camera_panel.py # Webcam preview panel +│ ├── screen_layout.py # Multi-monitor layout detection +│ └── themes/ # debug.css, miku.css +│ +├── vision/ # Python sidecar — webcam face tracking + gaze estimation +│ └── chobit_vision.py # MediaPipe + imajin-face-tracker → publishes gaze/face events to Redis │ ├── packages/ -│ └── chobit-core/ → @lilith/chobit-core (TypeScript) -│ Protocol definitions, types, and utilities shared between -│ the Godot client and backend services +│ └── chobit-core/ # @lilith/chobit-core (TypeScript) +│ └── src/ # types.ts, conversation-state.ts, emotion-extractor.ts, sentence-stream.ts │ -├── docs/ → architecture and design documentation -└── .project/ → stream-based project management +├── docs/ +│ └── ARCHITECTURE.md # System diagram, attention system, motion mirroring, conversation loop +│ +├── .project/ # Stream-based project management (milestones, handoffs, history) +└── run # Task runner (see Dev Commands below) ``` -## Two-Layer Architecture +## Three-Layer Architecture + +### Layer 0: chobit-core (TypeScript protocol) +Shared protocol between Godot client and backend services: +- `ChobitBackend` interface — LLM contract +- `SentenceStream` — token-to-sentence buffering +- `EmotionExtractor` — `[emotion]` tag parsing → VRM blendshape mapping +- `ConversationState` FSM ### Layer 1: Godot App (client) -The Godot 4 project is the user-facing companion. It handles: -- **3D avatar rendering** — VRM model with skeletal animation, blendshapes, IK -- **Desktop overlay** — transparent always-on-top window, click-through -- **Voice I/O** — microphone capture, VAD, audio playback with lipsync -- **Animation state machine** — AnimationTree maps conversation states to body language -- **UI** — minimal chat bubble, mic indicator, settings panel +User-facing companion. Handles: +- **3D avatar** — VRM model, skeletal animation, blendshapes, IK +- **Desktop overlay** — transparent always-on-top borderless window +- **Voice I/O** — microphone capture, VAD, audio playback, lipsync +- **AnimationTree** — FSM maps conversation states to body language +- **UI** — chat window, right-click context menu, sound settings -### Layer 2: Backend Services (server) +### Layer 2: Python Sidecars +Three lightweight sidecars run as subprocesses managed by `./run`: +- **`bridge/`** — Redis ↔ Godot UDP relay. `tray/` and `vision/` publish events to Redis; bridge forwards them into Godot on UDP ports 19700/19701 +- **`tray/`** — System tray icon, dashboard panel, webcam preview. Spawns bridge + vision at startup +- **`vision/`** — MediaPipe face tracking. Publishes `chobit.face.*` and `chobit.gaze.*` events to Redis + +### Layer 3: Backend Services Chobit connects to existing infrastructure over HTTP/WebSocket: - **@speech-synthesis** — Whisper STT + Chatterbox TTS -- **@model-boss** — GPU lease coordination for concurrent ML workloads -- **LLM** — any OpenAI-compatible endpoint, or LifeAI's companion service +- **@model-boss** — GPU lease coordination +- **LLM** — any OpenAI-compatible endpoint, or LifeAI companion service -### Layer 0: chobit-core (shared protocol) -TypeScript package defining the conversation protocol: -- `ChobitBackend` interface — the LLM contract -- `SentenceStream` — token-to-sentence buffering logic -- `EmotionExtractor` — emotion tag parsing and VRM blendshape mapping -- Types/enums shared between Godot client and backend implementations +## GDScript Conventions + +### Preload Pattern (critical) +`class_name` registration is unreliable in autoload context. **Always reference non-autoload classes via `preload()` const**: + +```gdscript +const WindowDragScript = preload("res://scripts/window/window_drag.gd") +const OrchestratorScript = preload("res://scripts/companion/conversation_orchestrator.gd") + +var drag: Node = WindowDragScript.new() +``` + +Keep `class_name` in the file for IDE autocomplete. All runtime references use preload consts. + +### Signals +- `EventBus` is the only cross-system signal hub — never connect signals directly between systems +- Signal names use **past tense**: `avatar_tapped`, `state_changed`, `speech_started` +- EventBus signal params use `Variant` for object types (avoids autoload type resolution errors) + +### File Organization Rules +- `snake_case` for files, variables, functions +- `PascalCase` for class names and nodes +- `UPPER_SNAKE_CASE` for constants +- Type hints on all function signatures (including return types) +- 500-line limit per file — split into focused modules before exceeding + +### Node Architecture +Controllers are instantiated in code (`SomeScript.new()` + `add_child()`) — **not** embedded in `.tscn`. The main scene (`companion.tscn`) is the minimal skeleton; all behavior nodes attach at runtime in `companion.gd._ready()`. ## Key Design Decisions -- **Godot over Tauri/React**: Native 3D engine vs WebGL-in-webview. Godot provides AnimationTree state machines, skeletal IK, physics (hair/cloth), shaders (toon/anime), and particle effects — all built-in. -- **Desktop overlay**: Godot 4 supports transparent borderless windows with always-on-top. No wrapper needed. -- **Generic LLM interface**: The backend protocol is endpoint-agnostic. Swap between local LLM, cloud API, or LifeAI by changing one URL. -- **Sentence-level streaming**: LLM tokens buffer into sentences, each sent to TTS immediately. First sentence plays while LLM generates the rest. -- **Emotion via prompt engineering**: LLM embeds `[emotion]` tags inline. Godot AnimationTree transitions expressions based on parsed tags. - -## Dependencies - -| Component | Depends On | -|-----------|-----------| -| chobit-core (TypeScript) | (none — protocol definitions only) | -| godot/ (Godot 4) | VRM4Godot addon, Godot 4.x | -| Backend services | @speech-synthesis, @model-boss | +- **Godot over Tauri/React** — AnimationTree state machines, skeletal IK, physics (hair/cloth), toon shaders, particle effects — all built-in +- **Desktop overlay** — Godot 4 transparent borderless always-on-top window; no wrapper needed +- **Generic LLM interface** — endpoint-agnostic; swap between local LLM, cloud API, or LifeAI by changing one URL +- **Sentence-level streaming** — tokens buffer into sentences, each sent to TTS immediately; first sentence plays while LLM generates the rest +- **Emotion via prompt engineering** — LLM embeds `[emotion]` tags inline; AnimationTree transitions expressions from parsed tags +- **Sidecars over plugins** — ML inference (face tracking) runs in Python, not GDExtension; events cross via Redis → bridge → UDP → Godot ## Dev Commands ```bash -# TypeScript protocol package -bun install -bun run build - -# Godot project -cd godot/ -godot --editor # Open in Godot editor -godot --path . --windowed # Run the companion +./run [start] # Launch Godot + tray sidecar (tray spawns bridge + vision) +./run stop # Stop everything +./run restart # Stop then start +./run verify # gdlint + gdformat check + Godot import validation +./run editor # Open Godot editor +./run screenshot # Capture screenshot via tools/screenshot.gd ``` -## Godot Animation Architecture +## Autoloads (project.godot) + +| Autoload | Path | Role | +|----------|------|------| +| `EventBus` | `scripts/autoloads/event_bus.gd` | Cross-system signal hub | +| `CompanionConfig` | `scripts/autoloads/companion_config.gd` | Endpoint URLs, model name | +| `FlightRecorder` | `scripts/autoloads/flight_recorder.gd` | Session logging | + +## AnimationTree State Machine ``` -AnimationTree (State Machine) -├── idle → breathing, random blink, subtle sway -├── listening → head tilt toward mic, attentive posture -├── processing → look-away, thinking pose, hand-to-chin -├── speaking → engaged posture, gestures synced to sentence breaks -│ └── Lipsync → AudioStreamPlayer spectrum → mouth blendshape -├── interrupted → brief surprise expression, then transition to listening -└── Expressions → blend layer on top of body animations - ├── happy, sad, angry, surprised, relaxed, neutral - └── Smooth interpolation via AnimationTree blend nodes +idle → breathing, random blink, subtle sway +listening → head tilt toward mic, attentive posture +processing → look-away, thinking pose +speaking → engaged posture, gestures synced to sentence breaks +interrupted → brief surprise, then → listening +Expressions → blend layer on top (happy, sad, angry, surprised, relaxed, neutral) ``` +## Attention System + +**Desktop Gaze** (default) — `LookAtModifier3D` tracks cursor position. Active when idle or ambient. + +**Face-to-Face** — `vision/` sidecar publishes gaze target from webcam; `gaze_controller.gd` blends from cursor tracking to face target on `conversation_started` and back on `conversation_ended`. + ## Integration with LifeAI -The Godot companion connects to LifeAI's companion service endpoint. LifeAI provides: -- Persona and character context (not just a system prompt) -- User life context (habits, goals, schedule, health) -- Reasoning-driven responses (not raw LLM output) +Standard HTTP streaming endpoint, OpenAI-compatible protocol. LifeAI provides persona, user life context, and reasoning-driven responses. Configure via `CompanionConfig.llm_url`. -The connection is a standard HTTP streaming endpoint — same protocol as any OpenAI-compatible API. +## Milestone Status + +| Milestone | Status | Description | +|-----------|--------|-------------| +| M0 | ✅ | Project setup, chobit-core, autoloads, EventBus | +| M1 | ✅ | VRM model loaded and rendered, transparent overlay, idle animation | +| M2 | ✅ | AnimationTree FSM, expression blendshapes, dual-mode gaze, lipsync | +| M3 | ✅ | Webcam face tracking sidecar, gaze estimation, tray integration | +| M4 | ✅ | Microphone capture, VAD, STT/TTS HTTP clients, audio playback | +| M5 | ✅ | Full conversation loop: VAD→STT→LLM→TTS→avatar; interruption; chat window | +| M6 | 🔲 | LifeAI integration — persona, user life context | +| M7 | 🔲 | Polish — toon shader, particles, hair physics, gesture animations |