docs(docs): 📝 Clarify CLAUDE.md metadata fields and add required project details

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Claude Code 2026-03-28 04:11:55 -07:00
parent fba747cea1
commit 7fc8fe80e0
3 changed files with 141 additions and 74 deletions

View file

@ -1 +1 @@
2513251
590621

View file

@ -1 +1 @@
2513252
590622

211
CLAUDE.md
View file

@ -6,102 +6,169 @@ Interactive AI companion — Godot 4 desktop app with 3D VRM avatar, voice inter
```
@chobit/
├── godot/ # Godot 4 project — the companion app
│ ├── project.godot # Godot project config
│ ├── scenes/ # Scene tree (.tscn files)
│ │ ├── companion.tscn # Main scene — transparent window + avatar
│ │ ├── avatar.tscn # VRM model + animation tree
│ │ └── ui/ # Chat bubble, mic indicator, settings
│ ├── scripts/ # GDScript logic
│ │ ├── companion.gd # Main orchestrator — conversation loop
│ │ ├── avatar/ # Avatar controller, expression, lipsync
│ │ ├── voice/ # Microphone input, VAD, audio playback
│ │ └── backend/ # HTTP/WebSocket clients for STT, TTS, LLM
│ ├── models/ # VRM model files (.vrm)
│ ├── audio/ # Audio resources
│ └── addons/ # Godot addons (VRM4Godot, etc.)
├── godot/ # Godot 4.6 project (the companion app)
│ ├── project.godot # Autoloads, main scene, window config
│ ├── addons/ # VRM4Godot, Godot-MToon-Shader
│ ├── audio/ # Audio assets (startup sound, etc.)
│ ├── config/ # Runtime config (auto-generated, gitignored)
│ ├── models/ # VRM model files (.vrm, gitignored)
│ ├── scenes/
│ │ └── companion.tscn # Main scene — transparent window + avatar
│ ├── scripts/
│ │ ├── audio/ # sound_engine.gd, sound_config.gd
│ │ ├── autoloads/ # event_bus.gd, companion_config.gd, flight_recorder.gd
│ │ ├── avatar/ # animation_state_machine, expression_controller, gaze_controller,
│ │ │ # idle_animator, lipsync_controller, attention_reactor
│ │ ├── backend/ # llm_client.gd, stt_client.gd, tts_client.gd
│ │ ├── companion/ # companion.gd (main), conversation_orchestrator.gd,
│ │ │ # tray_listener.gd, avatar_hitbox.gd, avatar_rotate.gd
│ │ ├── ui/ # chat_window.gd, context_menu.gd, sound_settings_window.gd
│ │ ├── util/ # node_utils.gd, config_paths.gd, screen_cursor.gd
│ │ ├── voice/ # microphone.gd
│ │ └── window/ # window_drag.gd, window_zoom.gd, edge_snap.gd
│ └── tools/ # Editor helper scripts (list_animations, list_blendshapes,
│ # screenshot.gd, zoom_test.gd)
├── bridge/ # Python sidecar — Redis ↔ Godot UDP bridge
│ └── chobit_bridge.py # Forwards lilith-eventbus events into Godot via UDP (port 19700/19701)
├── tray/ # Python sidecar — system tray UI + subprocess manager
│ ├── chobit_tray.py # TrayApp: spawns bridge + vision at startup, listens on port 19701
│ ├── chobit_board.py # Dashboard UI panel
│ ├── camera_panel.py # Webcam preview panel
│ ├── screen_layout.py # Multi-monitor layout detection
│ └── themes/ # debug.css, miku.css
├── vision/ # Python sidecar — webcam face tracking + gaze estimation
│ └── chobit_vision.py # MediaPipe + imajin-face-tracker → publishes gaze/face events to Redis
├── packages/
│ └── chobit-core/ → @lilith/chobit-core (TypeScript)
│ Protocol definitions, types, and utilities shared between
│ the Godot client and backend services
│ └── chobit-core/ # @lilith/chobit-core (TypeScript)
│ └── src/ # types.ts, conversation-state.ts, emotion-extractor.ts, sentence-stream.ts
├── docs/ → architecture and design documentation
└── .project/ → stream-based project management
├── docs/
│ └── ARCHITECTURE.md # System diagram, attention system, motion mirroring, conversation loop
├── .project/ # Stream-based project management (milestones, handoffs, history)
└── run # Task runner (see Dev Commands below)
```
## Two-Layer Architecture
## Three-Layer Architecture
### Layer 0: chobit-core (TypeScript protocol)
Shared protocol between Godot client and backend services:
- `ChobitBackend` interface — LLM contract
- `SentenceStream` — token-to-sentence buffering
- `EmotionExtractor``[emotion]` tag parsing → VRM blendshape mapping
- `ConversationState` FSM
### Layer 1: Godot App (client)
The Godot 4 project is the user-facing companion. It handles:
- **3D avatar rendering** — VRM model with skeletal animation, blendshapes, IK
- **Desktop overlay** — transparent always-on-top window, click-through
- **Voice I/O** — microphone capture, VAD, audio playback with lipsync
- **Animation state machine** — AnimationTree maps conversation states to body language
- **UI** — minimal chat bubble, mic indicator, settings panel
User-facing companion. Handles:
- **3D avatar** — VRM model, skeletal animation, blendshapes, IK
- **Desktop overlay** — transparent always-on-top borderless window
- **Voice I/O** — microphone capture, VAD, audio playback, lipsync
- **AnimationTree** — FSM maps conversation states to body language
- **UI**chat window, right-click context menu, sound settings
### Layer 2: Backend Services (server)
### Layer 2: Python Sidecars
Three lightweight sidecars run as subprocesses managed by `./run`:
- **`bridge/`** — Redis ↔ Godot UDP relay. `tray/` and `vision/` publish events to Redis; bridge forwards them into Godot on UDP ports 19700/19701
- **`tray/`** — System tray icon, dashboard panel, webcam preview. Spawns bridge + vision at startup
- **`vision/`** — MediaPipe face tracking. Publishes `chobit.face.*` and `chobit.gaze.*` events to Redis
### Layer 3: Backend Services
Chobit connects to existing infrastructure over HTTP/WebSocket:
- **@speech-synthesis** — Whisper STT + Chatterbox TTS
- **@model-boss** — GPU lease coordination for concurrent ML workloads
- **LLM** — any OpenAI-compatible endpoint, or LifeAI's companion service
- **@model-boss** — GPU lease coordination
- **LLM** — any OpenAI-compatible endpoint, or LifeAI companion service
### Layer 0: chobit-core (shared protocol)
TypeScript package defining the conversation protocol:
- `ChobitBackend` interface — the LLM contract
- `SentenceStream` — token-to-sentence buffering logic
- `EmotionExtractor` — emotion tag parsing and VRM blendshape mapping
- Types/enums shared between Godot client and backend implementations
## GDScript Conventions
### Preload Pattern (critical)
`class_name` registration is unreliable in autoload context. **Always reference non-autoload classes via `preload()` const**:
```gdscript
const WindowDragScript = preload("res://scripts/window/window_drag.gd")
const OrchestratorScript = preload("res://scripts/companion/conversation_orchestrator.gd")
var drag: Node = WindowDragScript.new()
```
Keep `class_name` in the file for IDE autocomplete. All runtime references use preload consts.
### Signals
- `EventBus` is the only cross-system signal hub — never connect signals directly between systems
- Signal names use **past tense**: `avatar_tapped`, `state_changed`, `speech_started`
- EventBus signal params use `Variant` for object types (avoids autoload type resolution errors)
### File Organization Rules
- `snake_case` for files, variables, functions
- `PascalCase` for class names and nodes
- `UPPER_SNAKE_CASE` for constants
- Type hints on all function signatures (including return types)
- 500-line limit per file — split into focused modules before exceeding
### Node Architecture
Controllers are instantiated in code (`SomeScript.new()` + `add_child()`) — **not** embedded in `.tscn`. The main scene (`companion.tscn`) is the minimal skeleton; all behavior nodes attach at runtime in `companion.gd._ready()`.
## Key Design Decisions
- **Godot over Tauri/React**: Native 3D engine vs WebGL-in-webview. Godot provides AnimationTree state machines, skeletal IK, physics (hair/cloth), shaders (toon/anime), and particle effects — all built-in.
- **Desktop overlay**: Godot 4 supports transparent borderless windows with always-on-top. No wrapper needed.
- **Generic LLM interface**: The backend protocol is endpoint-agnostic. Swap between local LLM, cloud API, or LifeAI by changing one URL.
- **Sentence-level streaming**: LLM tokens buffer into sentences, each sent to TTS immediately. First sentence plays while LLM generates the rest.
- **Emotion via prompt engineering**: LLM embeds `[emotion]` tags inline. Godot AnimationTree transitions expressions based on parsed tags.
## Dependencies
| Component | Depends On |
|-----------|-----------|
| chobit-core (TypeScript) | (none — protocol definitions only) |
| godot/ (Godot 4) | VRM4Godot addon, Godot 4.x |
| Backend services | @speech-synthesis, @model-boss |
- **Godot over Tauri/React** — AnimationTree state machines, skeletal IK, physics (hair/cloth), toon shaders, particle effects — all built-in
- **Desktop overlay** — Godot 4 transparent borderless always-on-top window; no wrapper needed
- **Generic LLM interface** — endpoint-agnostic; swap between local LLM, cloud API, or LifeAI by changing one URL
- **Sentence-level streaming** — tokens buffer into sentences, each sent to TTS immediately; first sentence plays while LLM generates the rest
- **Emotion via prompt engineering** — LLM embeds `[emotion]` tags inline; AnimationTree transitions expressions from parsed tags
- **Sidecars over plugins** — ML inference (face tracking) runs in Python, not GDExtension; events cross via Redis → bridge → UDP → Godot
## Dev Commands
```bash
# TypeScript protocol package
bun install
bun run build
# Godot project
cd godot/
godot --editor # Open in Godot editor
godot --path . --windowed # Run the companion
./run [start] # Launch Godot + tray sidecar (tray spawns bridge + vision)
./run stop # Stop everything
./run restart # Stop then start
./run verify # gdlint + gdformat check + Godot import validation
./run editor # Open Godot editor
./run screenshot # Capture screenshot via tools/screenshot.gd
```
## Godot Animation Architecture
## Autoloads (project.godot)
| Autoload | Path | Role |
|----------|------|------|
| `EventBus` | `scripts/autoloads/event_bus.gd` | Cross-system signal hub |
| `CompanionConfig` | `scripts/autoloads/companion_config.gd` | Endpoint URLs, model name |
| `FlightRecorder` | `scripts/autoloads/flight_recorder.gd` | Session logging |
## AnimationTree State Machine
```
AnimationTree (State Machine)
├── idle → breathing, random blink, subtle sway
├── listening → head tilt toward mic, attentive posture
├── processing → look-away, thinking pose, hand-to-chin
├── speaking → engaged posture, gestures synced to sentence breaks
│ └── Lipsync → AudioStreamPlayer spectrum → mouth blendshape
├── interrupted → brief surprise expression, then transition to listening
└── Expressions → blend layer on top of body animations
├── happy, sad, angry, surprised, relaxed, neutral
└── Smooth interpolation via AnimationTree blend nodes
idle → breathing, random blink, subtle sway
listening → head tilt toward mic, attentive posture
processing → look-away, thinking pose
speaking → engaged posture, gestures synced to sentence breaks
interrupted → brief surprise, then → listening
Expressions → blend layer on top (happy, sad, angry, surprised, relaxed, neutral)
```
## Attention System
**Desktop Gaze** (default) — `LookAtModifier3D` tracks cursor position. Active when idle or ambient.
**Face-to-Face** — `vision/` sidecar publishes gaze target from webcam; `gaze_controller.gd` blends from cursor tracking to face target on `conversation_started` and back on `conversation_ended`.
## Integration with LifeAI
The Godot companion connects to LifeAI's companion service endpoint. LifeAI provides:
- Persona and character context (not just a system prompt)
- User life context (habits, goals, schedule, health)
- Reasoning-driven responses (not raw LLM output)
Standard HTTP streaming endpoint, OpenAI-compatible protocol. LifeAI provides persona, user life context, and reasoning-driven responses. Configure via `CompanionConfig.llm_url`.
The connection is a standard HTTP streaming endpoint — same protocol as any OpenAI-compatible API.
## Milestone Status
| Milestone | Status | Description |
|-----------|--------|-------------|
| M0 | ✅ | Project setup, chobit-core, autoloads, EventBus |
| M1 | ✅ | VRM model loaded and rendered, transparent overlay, idle animation |
| M2 | ✅ | AnimationTree FSM, expression blendshapes, dual-mode gaze, lipsync |
| M3 | ✅ | Webcam face tracking sidecar, gaze estimation, tray integration |
| M4 | ✅ | Microphone capture, VAD, STT/TTS HTTP clients, audio playback |
| M5 | ✅ | Full conversation loop: VAD→STT→LLM→TTS→avatar; interruption; chat window |
| M6 | 🔲 | LifeAI integration — persona, user life context |
| M7 | 🔲 | Polish — toon shader, particles, hair physics, gesture animations |