docs(docs): 📝 Clarify CLAUDE.md metadata fields and add required project details
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
fba747cea1
commit
7fc8fe80e0
3 changed files with 141 additions and 74 deletions
|
|
@ -1 +1 @@
|
|||
2513251
|
||||
590621
|
||||
|
|
|
|||
|
|
@ -1 +1 @@
|
|||
2513252
|
||||
590622
|
||||
|
|
|
|||
211
CLAUDE.md
211
CLAUDE.md
|
|
@ -6,102 +6,169 @@ Interactive AI companion — Godot 4 desktop app with 3D VRM avatar, voice inter
|
|||
|
||||
```
|
||||
@chobit/
|
||||
├── godot/ # Godot 4 project — the companion app
|
||||
│ ├── project.godot # Godot project config
|
||||
│ ├── scenes/ # Scene tree (.tscn files)
|
||||
│ │ ├── companion.tscn # Main scene — transparent window + avatar
|
||||
│ │ ├── avatar.tscn # VRM model + animation tree
|
||||
│ │ └── ui/ # Chat bubble, mic indicator, settings
|
||||
│ ├── scripts/ # GDScript logic
|
||||
│ │ ├── companion.gd # Main orchestrator — conversation loop
|
||||
│ │ ├── avatar/ # Avatar controller, expression, lipsync
|
||||
│ │ ├── voice/ # Microphone input, VAD, audio playback
|
||||
│ │ └── backend/ # HTTP/WebSocket clients for STT, TTS, LLM
|
||||
│ ├── models/ # VRM model files (.vrm)
|
||||
│ ├── audio/ # Audio resources
|
||||
│ └── addons/ # Godot addons (VRM4Godot, etc.)
|
||||
├── godot/ # Godot 4.6 project (the companion app)
|
||||
│ ├── project.godot # Autoloads, main scene, window config
|
||||
│ ├── addons/ # VRM4Godot, Godot-MToon-Shader
|
||||
│ ├── audio/ # Audio assets (startup sound, etc.)
|
||||
│ ├── config/ # Runtime config (auto-generated, gitignored)
|
||||
│ ├── models/ # VRM model files (.vrm, gitignored)
|
||||
│ ├── scenes/
|
||||
│ │ └── companion.tscn # Main scene — transparent window + avatar
|
||||
│ ├── scripts/
|
||||
│ │ ├── audio/ # sound_engine.gd, sound_config.gd
|
||||
│ │ ├── autoloads/ # event_bus.gd, companion_config.gd, flight_recorder.gd
|
||||
│ │ ├── avatar/ # animation_state_machine, expression_controller, gaze_controller,
|
||||
│ │ │ # idle_animator, lipsync_controller, attention_reactor
|
||||
│ │ ├── backend/ # llm_client.gd, stt_client.gd, tts_client.gd
|
||||
│ │ ├── companion/ # companion.gd (main), conversation_orchestrator.gd,
|
||||
│ │ │ # tray_listener.gd, avatar_hitbox.gd, avatar_rotate.gd
|
||||
│ │ ├── ui/ # chat_window.gd, context_menu.gd, sound_settings_window.gd
|
||||
│ │ ├── util/ # node_utils.gd, config_paths.gd, screen_cursor.gd
|
||||
│ │ ├── voice/ # microphone.gd
|
||||
│ │ └── window/ # window_drag.gd, window_zoom.gd, edge_snap.gd
|
||||
│ └── tools/ # Editor helper scripts (list_animations, list_blendshapes,
|
||||
│ # screenshot.gd, zoom_test.gd)
|
||||
│
|
||||
├── bridge/ # Python sidecar — Redis ↔ Godot UDP bridge
|
||||
│ └── chobit_bridge.py # Forwards lilith-eventbus events into Godot via UDP (port 19700/19701)
|
||||
│
|
||||
├── tray/ # Python sidecar — system tray UI + subprocess manager
|
||||
│ ├── chobit_tray.py # TrayApp: spawns bridge + vision at startup, listens on port 19701
|
||||
│ ├── chobit_board.py # Dashboard UI panel
|
||||
│ ├── camera_panel.py # Webcam preview panel
|
||||
│ ├── screen_layout.py # Multi-monitor layout detection
|
||||
│ └── themes/ # debug.css, miku.css
|
||||
│
|
||||
├── vision/ # Python sidecar — webcam face tracking + gaze estimation
|
||||
│ └── chobit_vision.py # MediaPipe + imajin-face-tracker → publishes gaze/face events to Redis
|
||||
│
|
||||
├── packages/
|
||||
│ └── chobit-core/ → @lilith/chobit-core (TypeScript)
|
||||
│ Protocol definitions, types, and utilities shared between
|
||||
│ the Godot client and backend services
|
||||
│ └── chobit-core/ # @lilith/chobit-core (TypeScript)
|
||||
│ └── src/ # types.ts, conversation-state.ts, emotion-extractor.ts, sentence-stream.ts
|
||||
│
|
||||
├── docs/ → architecture and design documentation
|
||||
└── .project/ → stream-based project management
|
||||
├── docs/
|
||||
│ └── ARCHITECTURE.md # System diagram, attention system, motion mirroring, conversation loop
|
||||
│
|
||||
├── .project/ # Stream-based project management (milestones, handoffs, history)
|
||||
└── run # Task runner (see Dev Commands below)
|
||||
```
|
||||
|
||||
## Two-Layer Architecture
|
||||
## Three-Layer Architecture
|
||||
|
||||
### Layer 0: chobit-core (TypeScript protocol)
|
||||
Shared protocol between Godot client and backend services:
|
||||
- `ChobitBackend` interface — LLM contract
|
||||
- `SentenceStream` — token-to-sentence buffering
|
||||
- `EmotionExtractor` — `[emotion]` tag parsing → VRM blendshape mapping
|
||||
- `ConversationState` FSM
|
||||
|
||||
### Layer 1: Godot App (client)
|
||||
The Godot 4 project is the user-facing companion. It handles:
|
||||
- **3D avatar rendering** — VRM model with skeletal animation, blendshapes, IK
|
||||
- **Desktop overlay** — transparent always-on-top window, click-through
|
||||
- **Voice I/O** — microphone capture, VAD, audio playback with lipsync
|
||||
- **Animation state machine** — AnimationTree maps conversation states to body language
|
||||
- **UI** — minimal chat bubble, mic indicator, settings panel
|
||||
User-facing companion. Handles:
|
||||
- **3D avatar** — VRM model, skeletal animation, blendshapes, IK
|
||||
- **Desktop overlay** — transparent always-on-top borderless window
|
||||
- **Voice I/O** — microphone capture, VAD, audio playback, lipsync
|
||||
- **AnimationTree** — FSM maps conversation states to body language
|
||||
- **UI** — chat window, right-click context menu, sound settings
|
||||
|
||||
### Layer 2: Backend Services (server)
|
||||
### Layer 2: Python Sidecars
|
||||
Three lightweight sidecars run as subprocesses managed by `./run`:
|
||||
- **`bridge/`** — Redis ↔ Godot UDP relay. `tray/` and `vision/` publish events to Redis; bridge forwards them into Godot on UDP ports 19700/19701
|
||||
- **`tray/`** — System tray icon, dashboard panel, webcam preview. Spawns bridge + vision at startup
|
||||
- **`vision/`** — MediaPipe face tracking. Publishes `chobit.face.*` and `chobit.gaze.*` events to Redis
|
||||
|
||||
### Layer 3: Backend Services
|
||||
Chobit connects to existing infrastructure over HTTP/WebSocket:
|
||||
- **@speech-synthesis** — Whisper STT + Chatterbox TTS
|
||||
- **@model-boss** — GPU lease coordination for concurrent ML workloads
|
||||
- **LLM** — any OpenAI-compatible endpoint, or LifeAI's companion service
|
||||
- **@model-boss** — GPU lease coordination
|
||||
- **LLM** — any OpenAI-compatible endpoint, or LifeAI companion service
|
||||
|
||||
### Layer 0: chobit-core (shared protocol)
|
||||
TypeScript package defining the conversation protocol:
|
||||
- `ChobitBackend` interface — the LLM contract
|
||||
- `SentenceStream` — token-to-sentence buffering logic
|
||||
- `EmotionExtractor` — emotion tag parsing and VRM blendshape mapping
|
||||
- Types/enums shared between Godot client and backend implementations
|
||||
## GDScript Conventions
|
||||
|
||||
### Preload Pattern (critical)
|
||||
`class_name` registration is unreliable in autoload context. **Always reference non-autoload classes via `preload()` const**:
|
||||
|
||||
```gdscript
|
||||
const WindowDragScript = preload("res://scripts/window/window_drag.gd")
|
||||
const OrchestratorScript = preload("res://scripts/companion/conversation_orchestrator.gd")
|
||||
|
||||
var drag: Node = WindowDragScript.new()
|
||||
```
|
||||
|
||||
Keep `class_name` in the file for IDE autocomplete. All runtime references use preload consts.
|
||||
|
||||
### Signals
|
||||
- `EventBus` is the only cross-system signal hub — never connect signals directly between systems
|
||||
- Signal names use **past tense**: `avatar_tapped`, `state_changed`, `speech_started`
|
||||
- EventBus signal params use `Variant` for object types (avoids autoload type resolution errors)
|
||||
|
||||
### File Organization Rules
|
||||
- `snake_case` for files, variables, functions
|
||||
- `PascalCase` for class names and nodes
|
||||
- `UPPER_SNAKE_CASE` for constants
|
||||
- Type hints on all function signatures (including return types)
|
||||
- 500-line limit per file — split into focused modules before exceeding
|
||||
|
||||
### Node Architecture
|
||||
Controllers are instantiated in code (`SomeScript.new()` + `add_child()`) — **not** embedded in `.tscn`. The main scene (`companion.tscn`) is the minimal skeleton; all behavior nodes attach at runtime in `companion.gd._ready()`.
|
||||
|
||||
## Key Design Decisions
|
||||
|
||||
- **Godot over Tauri/React**: Native 3D engine vs WebGL-in-webview. Godot provides AnimationTree state machines, skeletal IK, physics (hair/cloth), shaders (toon/anime), and particle effects — all built-in.
|
||||
- **Desktop overlay**: Godot 4 supports transparent borderless windows with always-on-top. No wrapper needed.
|
||||
- **Generic LLM interface**: The backend protocol is endpoint-agnostic. Swap between local LLM, cloud API, or LifeAI by changing one URL.
|
||||
- **Sentence-level streaming**: LLM tokens buffer into sentences, each sent to TTS immediately. First sentence plays while LLM generates the rest.
|
||||
- **Emotion via prompt engineering**: LLM embeds `[emotion]` tags inline. Godot AnimationTree transitions expressions based on parsed tags.
|
||||
|
||||
## Dependencies
|
||||
|
||||
| Component | Depends On |
|
||||
|-----------|-----------|
|
||||
| chobit-core (TypeScript) | (none — protocol definitions only) |
|
||||
| godot/ (Godot 4) | VRM4Godot addon, Godot 4.x |
|
||||
| Backend services | @speech-synthesis, @model-boss |
|
||||
- **Godot over Tauri/React** — AnimationTree state machines, skeletal IK, physics (hair/cloth), toon shaders, particle effects — all built-in
|
||||
- **Desktop overlay** — Godot 4 transparent borderless always-on-top window; no wrapper needed
|
||||
- **Generic LLM interface** — endpoint-agnostic; swap between local LLM, cloud API, or LifeAI by changing one URL
|
||||
- **Sentence-level streaming** — tokens buffer into sentences, each sent to TTS immediately; first sentence plays while LLM generates the rest
|
||||
- **Emotion via prompt engineering** — LLM embeds `[emotion]` tags inline; AnimationTree transitions expressions from parsed tags
|
||||
- **Sidecars over plugins** — ML inference (face tracking) runs in Python, not GDExtension; events cross via Redis → bridge → UDP → Godot
|
||||
|
||||
## Dev Commands
|
||||
|
||||
```bash
|
||||
# TypeScript protocol package
|
||||
bun install
|
||||
bun run build
|
||||
|
||||
# Godot project
|
||||
cd godot/
|
||||
godot --editor # Open in Godot editor
|
||||
godot --path . --windowed # Run the companion
|
||||
./run [start] # Launch Godot + tray sidecar (tray spawns bridge + vision)
|
||||
./run stop # Stop everything
|
||||
./run restart # Stop then start
|
||||
./run verify # gdlint + gdformat check + Godot import validation
|
||||
./run editor # Open Godot editor
|
||||
./run screenshot # Capture screenshot via tools/screenshot.gd
|
||||
```
|
||||
|
||||
## Godot Animation Architecture
|
||||
## Autoloads (project.godot)
|
||||
|
||||
| Autoload | Path | Role |
|
||||
|----------|------|------|
|
||||
| `EventBus` | `scripts/autoloads/event_bus.gd` | Cross-system signal hub |
|
||||
| `CompanionConfig` | `scripts/autoloads/companion_config.gd` | Endpoint URLs, model name |
|
||||
| `FlightRecorder` | `scripts/autoloads/flight_recorder.gd` | Session logging |
|
||||
|
||||
## AnimationTree State Machine
|
||||
|
||||
```
|
||||
AnimationTree (State Machine)
|
||||
├── idle → breathing, random blink, subtle sway
|
||||
├── listening → head tilt toward mic, attentive posture
|
||||
├── processing → look-away, thinking pose, hand-to-chin
|
||||
├── speaking → engaged posture, gestures synced to sentence breaks
|
||||
│ └── Lipsync → AudioStreamPlayer spectrum → mouth blendshape
|
||||
├── interrupted → brief surprise expression, then transition to listening
|
||||
└── Expressions → blend layer on top of body animations
|
||||
├── happy, sad, angry, surprised, relaxed, neutral
|
||||
└── Smooth interpolation via AnimationTree blend nodes
|
||||
idle → breathing, random blink, subtle sway
|
||||
listening → head tilt toward mic, attentive posture
|
||||
processing → look-away, thinking pose
|
||||
speaking → engaged posture, gestures synced to sentence breaks
|
||||
interrupted → brief surprise, then → listening
|
||||
Expressions → blend layer on top (happy, sad, angry, surprised, relaxed, neutral)
|
||||
```
|
||||
|
||||
## Attention System
|
||||
|
||||
**Desktop Gaze** (default) — `LookAtModifier3D` tracks cursor position. Active when idle or ambient.
|
||||
|
||||
**Face-to-Face** — `vision/` sidecar publishes gaze target from webcam; `gaze_controller.gd` blends from cursor tracking to face target on `conversation_started` and back on `conversation_ended`.
|
||||
|
||||
## Integration with LifeAI
|
||||
|
||||
The Godot companion connects to LifeAI's companion service endpoint. LifeAI provides:
|
||||
- Persona and character context (not just a system prompt)
|
||||
- User life context (habits, goals, schedule, health)
|
||||
- Reasoning-driven responses (not raw LLM output)
|
||||
Standard HTTP streaming endpoint, OpenAI-compatible protocol. LifeAI provides persona, user life context, and reasoning-driven responses. Configure via `CompanionConfig.llm_url`.
|
||||
|
||||
The connection is a standard HTTP streaming endpoint — same protocol as any OpenAI-compatible API.
|
||||
## Milestone Status
|
||||
|
||||
| Milestone | Status | Description |
|
||||
|-----------|--------|-------------|
|
||||
| M0 | ✅ | Project setup, chobit-core, autoloads, EventBus |
|
||||
| M1 | ✅ | VRM model loaded and rendered, transparent overlay, idle animation |
|
||||
| M2 | ✅ | AnimationTree FSM, expression blendshapes, dual-mode gaze, lipsync |
|
||||
| M3 | ✅ | Webcam face tracking sidecar, gaze estimation, tray integration |
|
||||
| M4 | ✅ | Microphone capture, VAD, STT/TTS HTTP clients, audio playback |
|
||||
| M5 | ✅ | Full conversation loop: VAD→STT→LLM→TTS→avatar; interruption; chat window |
|
||||
| M6 | 🔲 | LifeAI integration — persona, user life context |
|
||||
| M7 | 🔲 | Polish — toon shader, particles, hair physics, gesture animations |
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue