History

Claude Code 9abcdeac9d docs(root-root): 📝 Improve project clarity with updated README.md documentation for better onboarding Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>		2026-03-28 04:11:55 -07:00
..
handoffs	chore(godot): 🔧 Update Godot project configuration, documentation, and build setup files	2026-03-26 14:01:25 -07:00
templates	chore(godot): 🔧 Update Godot project configuration, documentation, and build setup files	2026-03-26 14:01:25 -07:00
README.md	docs(root-root): 📝 Improve project clarity with updated README.md documentation for better onboarding	2026-03-28 04:11:55 -07:00

README.md

Chobit Project Management

Stream-based project management for the Chobit interactive AI companion.

Directory Structure

.project/
├── README.md              # This file
├── streams/               # Active feature workstreams
│   └── <stream-name>/
│       ├── README.md      # Feature overview and architecture
│       ├── STATUS.md      # Current progress and blockers
│       ├── HANDOFF.md     # Session handoff context
│       └── NOTES.md       # Technical decisions and learnings
├── history/               # Completed work records
│   └── YYYYMMDD_description.md
└── templates/             # Stream templates

Active Streams

None active.

Milestone Roadmap

M0: Project Setup ✅

Godot 4 project initialized with transparent window config
chobit-core TypeScript package (ConversationState FSM, SentenceStream, EmotionExtractor)
gdtoolkit-config synced (gdlintrc, gdformatrc)
EventBus autoload with conversation lifecycle signals
Architecture docs, .gitignore, project structure

M1: Godot Skeleton ✅

VRM4Godot addon installed
VRM models loaded (Miku.vrm, Seed-san.vrm)
companion.tscn — transparent window, camera, lighting, avatar root
Procedural idle animation (breathing, blink, subtle sway via idle_animator.gd)
Desktop overlay verified (transparent, always-on-top, borderless)

M2: Avatar Animation & Attention System ✅

AnimationTree FSM (idle, listening, processing, speaking, interrupted)
Expression blendshapes (6 VRM expressions via expression_controller.gd)
Desktop Gaze — cursor tracking (gaze_controller.gd dual-mode)
Face-to-Face — webcam gaze target blend on conversation state change
Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape (lipsync_controller.gd)
attention_reactor.gd for event-driven gaze/posture reactions

M3: Sidecars & Tray Integration ✅

vision/ sidecar: MediaPipe face tracking → Redis eventbus (chobit.gaze., chobit.face.)
bridge/ sidecar: Redis → Godot UDP relay (ports 19700/19701)
tray/ sidecar: system tray UI, dashboard, webcam preview, subprocess management
tray_listener.gd: receives UDP events from bridge, drives gaze and companion behavior
./run script: start/stop/restart/verify/editor/screenshot

M4: Voice Pipeline ✅

microphone.gd: AudioEffectCapture + energy-based VAD
stt_client.gd: HTTP client for @speech-synthesis Whisper endpoint
tts_client.gd: HTTP client for Chatterbox TTS endpoint
sound_engine.gd + sound_config.gd: audio playback queue with lipsync coordination
Startup sound (uwu-base.mp3)

M5: Conversation Loop ✅

llm_client.gd: HTTP streaming, OpenAI-compatible
conversation_orchestrator.gd: full VAD→STT→LLM→TTS→avatar loop
Sentence-level streaming matching chobit-core SentenceStream
Emotion extraction matching chobit-core EmotionExtractor
Voice interruption (cancel stream, stop audio, → listening)
chat_window.gd: chat bubble UI, context_menu.gd, sound_settings_window.gd
window_drag.gd, window_zoom.gd, edge_snap.gd: window management

M6: LifeAI Integration 🔲

Connect to LifeAI companion service endpoint
Persona and character context from LifeAI
User life context (habits, goals, schedule)
Embed as desktop companion for the @life platform

M7: Polish 🔲

Toon/anime shader for character rendering
Particle effects for emotional states
Hair/cloth physics (VRM spring bones)
Gesture animations on sentence breaks
Multi-monitor awareness improvements

Key Technical Decisions

Decision	Choice	Rationale
Client engine	Godot 4	Native 3D, AnimationTree, IK, physics, transparent windows — vs WebGL-in-webview overhead
Avatar format	VRM	Open standard, huge ecosystem (VRoid), standardized blendshapes and bones
Voice detection	In-app VAD	Godot audio server provides AudioEffectCapture for mic input
Backend protocol	HTTP/WebSocket	Standard, matches existing @speech-synthesis and @model-boss APIs
Emotion system	LLM inline tags	Simpler than separate classifier, no extra model/GPU needed
Lipsync	Amplitude-based	AudioEffectSpectrumAnalyzer built into Godot, no external tooling
Attention system	Dual-mode (Desktop Gaze + Face-to-Face)	Desktop Gaze for ambient companionship, Face-to-Face for conversation engagement
Motion response	Gesture mirroring (classify → animate)	Companion personality, not puppet. Curated animations vs raw skeleton retargeting
Gesture detection	External process → labels over socket	Keeps Godot focused on rendering; ML runs separately

Research References

Cloned to ~/Code/@forks/ (2026-03-26):

Open-LLM-VTuber — best modular architecture, sentence streaming, emotion tags
Soul-of-Waifu — VRM + @pixiv/three-vrm, GoEmotions classifier, Mixamo animations
GPT-SoVITS — voice cloning comparison to Chatterbox
RealtimeSTT — dual-tier VAD pattern (WebRTC + Silero)
speech-to-speech (HuggingFace) — thread-per-handler pipeline, WebSocket streaming
local-talking-llm — Chatterbox emotion→exaggeration mapping