chobit/.project
Claude Code 9abcdeac9d docs(root-root): 📝 Improve project clarity with updated README.md documentation for better onboarding
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-03-28 04:11:55 -07:00
..
handoffs chore(godot): 🔧 Update Godot project configuration, documentation, and build setup files 2026-03-26 14:01:25 -07:00
templates chore(godot): 🔧 Update Godot project configuration, documentation, and build setup files 2026-03-26 14:01:25 -07:00
README.md docs(root-root): 📝 Improve project clarity with updated README.md documentation for better onboarding 2026-03-28 04:11:55 -07:00

Chobit Project Management

Stream-based project management for the Chobit interactive AI companion.

Directory Structure

.project/
├── README.md              # This file
├── streams/               # Active feature workstreams
│   └── <stream-name>/
│       ├── README.md      # Feature overview and architecture
│       ├── STATUS.md      # Current progress and blockers
│       ├── HANDOFF.md     # Session handoff context
│       └── NOTES.md       # Technical decisions and learnings
├── history/               # Completed work records
│   └── YYYYMMDD_description.md
└── templates/             # Stream templates

Active Streams

None active.

Milestone Roadmap

M0: Project Setup

  • Godot 4 project initialized with transparent window config
  • chobit-core TypeScript package (ConversationState FSM, SentenceStream, EmotionExtractor)
  • gdtoolkit-config synced (gdlintrc, gdformatrc)
  • EventBus autoload with conversation lifecycle signals
  • Architecture docs, .gitignore, project structure

M1: Godot Skeleton

  • VRM4Godot addon installed
  • VRM models loaded (Miku.vrm, Seed-san.vrm)
  • companion.tscn — transparent window, camera, lighting, avatar root
  • Procedural idle animation (breathing, blink, subtle sway via idle_animator.gd)
  • Desktop overlay verified (transparent, always-on-top, borderless)

M2: Avatar Animation & Attention System

  • AnimationTree FSM (idle, listening, processing, speaking, interrupted)
  • Expression blendshapes (6 VRM expressions via expression_controller.gd)
  • Desktop Gaze — cursor tracking (gaze_controller.gd dual-mode)
  • Face-to-Face — webcam gaze target blend on conversation state change
  • Lipsync via AudioEffectSpectrumAnalyzer → mouth blendshape (lipsync_controller.gd)
  • attention_reactor.gd for event-driven gaze/posture reactions

M3: Sidecars & Tray Integration

  • vision/ sidecar: MediaPipe face tracking → Redis eventbus (chobit.gaze., chobit.face.)
  • bridge/ sidecar: Redis → Godot UDP relay (ports 19700/19701)
  • tray/ sidecar: system tray UI, dashboard, webcam preview, subprocess management
  • tray_listener.gd: receives UDP events from bridge, drives gaze and companion behavior
  • ./run script: start/stop/restart/verify/editor/screenshot

M4: Voice Pipeline

  • microphone.gd: AudioEffectCapture + energy-based VAD
  • stt_client.gd: HTTP client for @speech-synthesis Whisper endpoint
  • tts_client.gd: HTTP client for Chatterbox TTS endpoint
  • sound_engine.gd + sound_config.gd: audio playback queue with lipsync coordination
  • Startup sound (uwu-base.mp3)

M5: Conversation Loop

  • llm_client.gd: HTTP streaming, OpenAI-compatible
  • conversation_orchestrator.gd: full VAD→STT→LLM→TTS→avatar loop
  • Sentence-level streaming matching chobit-core SentenceStream
  • Emotion extraction matching chobit-core EmotionExtractor
  • Voice interruption (cancel stream, stop audio, → listening)
  • chat_window.gd: chat bubble UI, context_menu.gd, sound_settings_window.gd
  • window_drag.gd, window_zoom.gd, edge_snap.gd: window management

M6: LifeAI Integration 🔲

  • Connect to LifeAI companion service endpoint
  • Persona and character context from LifeAI
  • User life context (habits, goals, schedule)
  • Embed as desktop companion for the @life platform

M7: Polish 🔲

  • Toon/anime shader for character rendering
  • Particle effects for emotional states
  • Hair/cloth physics (VRM spring bones)
  • Gesture animations on sentence breaks
  • Multi-monitor awareness improvements

Key Technical Decisions

Decision Choice Rationale
Client engine Godot 4 Native 3D, AnimationTree, IK, physics, transparent windows — vs WebGL-in-webview overhead
Avatar format VRM Open standard, huge ecosystem (VRoid), standardized blendshapes and bones
Voice detection In-app VAD Godot audio server provides AudioEffectCapture for mic input
Backend protocol HTTP/WebSocket Standard, matches existing @speech-synthesis and @model-boss APIs
Emotion system LLM inline tags Simpler than separate classifier, no extra model/GPU needed
Lipsync Amplitude-based AudioEffectSpectrumAnalyzer built into Godot, no external tooling
Attention system Dual-mode (Desktop Gaze + Face-to-Face) Desktop Gaze for ambient companionship, Face-to-Face for conversation engagement
Motion response Gesture mirroring (classify → animate) Companion personality, not puppet. Curated animations vs raw skeleton retargeting
Gesture detection External process → labels over socket Keeps Godot focused on rendering; ML runs separately

Research References

Cloned to ~/Code/@forks/ (2026-03-26):

  • Open-LLM-VTuber — best modular architecture, sentence streaming, emotion tags
  • Soul-of-Waifu — VRM + @pixiv/three-vrm, GoEmotions classifier, Mixamo animations
  • GPT-SoVITS — voice cloning comparison to Chatterbox
  • RealtimeSTT — dual-tier VAD pattern (WebRTC + Silero)
  • speech-to-speech (HuggingFace) — thread-per-handler pipeline, WebSocket streaming
  • local-talking-llm — Chatterbox emotion→exaggeration mapping