docs(root-root): 📝 Improve project clarity with updated README.md documentation for better onboarding
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
7fc8fe80e0
commit
9abcdeac9d
1 changed files with 38 additions and 41 deletions
|
|
@ -20,7 +20,7 @@ Stream-based project management for the Chobit interactive AI companion.
|
|||
|
||||
## Active Streams
|
||||
|
||||
None yet — project is in initial scaffolding phase.
|
||||
None active.
|
||||
|
||||
## Milestone Roadmap
|
||||
|
||||
|
|
@ -31,59 +31,56 @@ None yet — project is in initial scaffolding phase.
|
|||
- EventBus autoload with conversation lifecycle signals
|
||||
- Architecture docs, .gitignore, project structure
|
||||
|
||||
### M1: Godot Skeleton
|
||||
- Install VRM4Godot addon
|
||||
- Download test VRM model (free from VRoid Hub)
|
||||
- Create `companion.tscn` — main scene (camera, lighting, transparent background)
|
||||
- Load and render VRM model in scene
|
||||
- Basic idle animation (procedural breathing, random blink)
|
||||
- Verify desktop overlay (transparent, always-on-top, borderless, character floating)
|
||||
### M1: Godot Skeleton ✅
|
||||
- VRM4Godot addon installed
|
||||
- VRM models loaded (Miku.vrm, Seed-san.vrm)
|
||||
- companion.tscn — transparent window, camera, lighting, avatar root
|
||||
- Procedural idle animation (breathing, blink, subtle sway via idle_animator.gd)
|
||||
- Desktop overlay verified (transparent, always-on-top, borderless)
|
||||
|
||||
### M2: Avatar Animation & Attention System
|
||||
- AnimationTree state machine (idle, listening, processing, speaking, interrupted)
|
||||
- Expression blendshapes driven by emotion input (6 VRM blendshapes)
|
||||
- **Desktop Gaze** — cursor tracking via LookAtModifier3D (idle mode)
|
||||
- **Face-to-Face** — webcam-based gaze target (conversation mode)
|
||||
- Gaze mode transition (smooth blend on conversation state change)
|
||||
- Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape
|
||||
### M2: Avatar Animation & Attention System ✅
|
||||
- AnimationTree FSM (idle, listening, processing, speaking, interrupted)
|
||||
- Expression blendshapes (6 VRM expressions via expression_controller.gd)
|
||||
- Desktop Gaze — cursor tracking (gaze_controller.gd dual-mode)
|
||||
- Face-to-Face — webcam gaze target blend on conversation state change
|
||||
- Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape (lipsync_controller.gd)
|
||||
- attention_reactor.gd for event-driven gaze/posture reactions
|
||||
|
||||
### M3: Motion Mirroring
|
||||
- Webcam gesture detection pipeline (MediaPipe or lightweight classifier)
|
||||
- Gesture classification: wave, nod, head cock, head shake, lean, thumbs up
|
||||
- Gesture → animation trigger mapping with personality variance
|
||||
- Deliberate response delay (0.2-0.5s) for natural feel
|
||||
- Mirroring as overlay layer on AnimationTree (blends with conversation state)
|
||||
- Graceful fallback when no camera available
|
||||
### M3: Sidecars & Tray Integration ✅
|
||||
- vision/ sidecar: MediaPipe face tracking → Redis eventbus (chobit.gaze.*, chobit.face.*)
|
||||
- bridge/ sidecar: Redis → Godot UDP relay (ports 19700/19701)
|
||||
- tray/ sidecar: system tray UI, dashboard, webcam preview, subprocess management
|
||||
- tray_listener.gd: receives UDP events from bridge, drives gaze and companion behavior
|
||||
- ./run script: start/stop/restart/verify/editor/screenshot
|
||||
|
||||
### M4: Voice Pipeline
|
||||
- Microphone capture via AudioEffectCapture
|
||||
- VAD (voice activity detection) in GDScript (energy-based + optional Silero)
|
||||
- HTTP client for STT (@speech-synthesis Whisper endpoint)
|
||||
- HTTP client for TTS (@speech-synthesis Chatterbox endpoint)
|
||||
- Audio playback queue with lipsync coordination
|
||||
### M4: Voice Pipeline ✅
|
||||
- microphone.gd: AudioEffectCapture + energy-based VAD
|
||||
- stt_client.gd: HTTP client for @speech-synthesis Whisper endpoint
|
||||
- tts_client.gd: HTTP client for Chatterbox TTS endpoint
|
||||
- sound_engine.gd + sound_config.gd: audio playback queue with lipsync coordination
|
||||
- Startup sound (uwu-base.mp3)
|
||||
|
||||
### M5: Conversation Loop
|
||||
- LLM client (HTTP streaming, OpenAI-compatible)
|
||||
- Sentence streaming (buffer tokens → sentences → TTS) matching chobit-core SentenceStream
|
||||
- Emotion extraction from LLM output matching chobit-core EmotionExtractor
|
||||
- Full loop: VAD → STT → LLM → TTS → avatar animation
|
||||
- Voice interruption (cancel stream, stop audio, transition to listening)
|
||||
- Conversation history management
|
||||
### M5: Conversation Loop ✅
|
||||
- llm_client.gd: HTTP streaming, OpenAI-compatible
|
||||
- conversation_orchestrator.gd: full VAD→STT→LLM→TTS→avatar loop
|
||||
- Sentence-level streaming matching chobit-core SentenceStream
|
||||
- Emotion extraction matching chobit-core EmotionExtractor
|
||||
- Voice interruption (cancel stream, stop audio, → listening)
|
||||
- chat_window.gd: chat bubble UI, context_menu.gd, sound_settings_window.gd
|
||||
- window_drag.gd, window_zoom.gd, edge_snap.gd: window management
|
||||
|
||||
### M6: LifeAI Integration
|
||||
### M6: LifeAI Integration 🔲
|
||||
- Connect to LifeAI companion service endpoint
|
||||
- Persona and character context from LifeAI
|
||||
- User life context (habits, goals, schedule)
|
||||
- Embed as desktop companion for the @life platform
|
||||
|
||||
### M7: Polish
|
||||
### M7: Polish 🔲
|
||||
- Toon/anime shader for character rendering
|
||||
- Particle effects for emotional states
|
||||
- Hair/cloth physics (Godot physics or VRM spring bones)
|
||||
- Hair/cloth physics (VRM spring bones)
|
||||
- Gesture animations on sentence breaks
|
||||
- Settings UI (model, voice, backend config)
|
||||
- System tray integration
|
||||
- Multi-monitor awareness
|
||||
- Multi-monitor awareness improvements
|
||||
|
||||
## Key Technical Decisions
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue