ai/docs/ARCHITECTURE.md
autocommit 670aae2729 docs(docs): 📝 Improve API reference clarity with updated examples and error-handling documentation
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-12 19:06:54 -07:00

20 KiB
Raw Permalink Blame History

@ai — Architecture

System Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                         @ai / ai-core                               │
│                      NestJS  :3790                                  │
│                                                                     │
│  ┌──────────────┐  ┌──────────────┐  ┌────────────────────────┐   │
│  │   identity/  │  │   memory/    │  │      personality/      │   │
│  │              │  │              │  │                        │   │
│  │ PersonaEntity│  │ MemoryEntry  │  │  JSON template loader  │   │
│  │ UserIdentity │  │ Redis cache  │  │  Prompt composer       │   │
│  │              │  │ PG fallback  │  │  Context-aware assembly│   │
│  └──────────────┘  └──────────────┘  └────────────────────────┘   │
│                                                                     │
│  ┌──────────────┐  ┌──────────────────────────────────────────┐   │
│  │    tasks/    │  │                context/                  │   │
│  │              │  │                                          │   │
│  │ TaskList     │  │  POST /context/compose                   │   │
│  │ Task         │  │  ┌─────────────────────────────────┐    │   │
│  │ Redis events │  │  │ identity → personality → memory │    │   │
│  │              │  │  │ → tasks → composed system prompt│    │   │
│  └──────────────┘  │  └─────────────────────────────────┘    │   │
│                    └──────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
    ┌────────────┐      ┌──────────────┐     ┌──────────────┐
    │  @chobit   │      │    @life     │     │   @kthulu    │
    │            │      │              │     │              │
    │ Godot 4    │      │ NestJS       │     │ CLI + NestJS │
    │ VRM avatar │      │ Life platform│     │ Coding agent │
    └────────────┘      └──────────────┘     └──────────────┘

Memory Architecture

Two-tier storage inherited from @life and @ml/knowledge-platform:

Request                Redis (short-term)         PostgreSQL (long-term)
   │                   TTL: 1 hour                 Permanent
   │
   ├─── GET /memory ──► cache hit? ──────────────────────────────────►
   │                         │ miss                                   │
   │                         └──────────────────────────────────────► │
   │                                                                   │
   ├─── POST /memory ──────────────────────────────────────────────► write both
   │
   └─── DELETE /memory ──► invalidate Redis key → soft-delete in PG

MemoryEntry schema (from @life/platform-ai):

@Entity('ai_memory_entries')
class MemoryEntryEntity {
  @PrimaryGeneratedColumn('uuid') id: string;
  @Column({ unique: true }) key: string;
  @Column('text') content: string;
  @Column({ nullable: true }) category: string;
  @Column('simple-array') tags: string[];
  @Column('jsonb', { default: {} }) metadata: Record<string, unknown>;
  @Column({ default: false }) deleted: boolean;
  @CreateDateColumn() created_at: Date;
  @UpdateDateColumn() updated_at: Date;
}

Personality System

Inherits the composable template format from @chobit/godot-desktop/config/personalities/miku.json.

Composition order (per-request, not static):

identity.base
  + identity.voice_constraint
  + active_traits[].positive
  + active_negatives[]
  + emotion_tags.instruction
  + depth_tier[inferred_tier].instruction
  + context_modifiers[time_of_day]
  + context_modifiers[conversation_depth]
  + context_modifiers[user_mood_signals[detected_mood]]
  + situation_overrides[detected_situations[]]
  + active_traits[].negative

Input context payload:

interface PersonalityContext {
  time_of_day: 'morning' | 'afternoon' | 'evening' | 'late_night';
  conversation_depth: 'shallow' | 'mid' | 'deep';
  user_mood: 'frustrated' | 'casual' | 'task_focused' | 'vulnerable';
  situations: string[];   // detected from recent message content
  tts_active: boolean;
  message_count: number;  // for depth_tier inference
  last_user_message: string;
}

Context Provider Pattern

@ai does not own task or appointment data. It defines the protocol.

Domain services implement ContextProvider interfaces. @ai aggregates their output into the context assembly pipeline. The data always lives in the service that owns it.

// @ai defines the interfaces
interface TaskContextProvider {
  getActiveTasks(identity_id: string, options: TaskQueryOptions): Promise<TaskSummary[]>;
}

interface AppointmentContextProvider {
  getUpcomingAppointments(identity_id: string, window_hours: number): Promise<AppointmentSummary[]>;
}

interface SessionStateProvider {
  getSessionState(identity_id: string): Promise<SessionState>;
}

interface SessionState {
  current_status: string;         // "coffee brewing", "shoot in progress", etc.
  completed_today: string[];      // items confirmed done this session
  last_updated: Date;
}

// Domain services implement it:
// @life → AppointmentContextProvider (calendar, scheduling, events)
// .quinn/todos.md → FileTaskContextProvider (file-backed ordered task list)
// .quinn/context.md → FileSessionStateProvider (file-backed live session state)
// @kthulu → CodeTaskContextProvider (coding session tasks)

Context providers register with ai-core on startup. The context/ module queries all registered providers and assembles their output into the system prompt.

ai.context.providers.register → { provider_id, type, endpoint }
ai.context.providers.query    → fanout to all registered providers
ai.context.assembled          → composed result emitted on Redis

Two-File Nag Pattern

The nag loop consumes two distinct sources:

Source Interface Contains Example
todos.md FileTaskContextProvider Ordered pending tasks "Headshots → Casual → Glamour → Platforms"
context.md FileSessionStateProvider Live session state "Coffee brewing, shoot not started"

Nag loop behavior:

  1. Find the earliest incomplete task in the ordered list
  2. Check session state — has this step actually started or completed?
  3. If state is ambiguous → ask a check-in question ("How's the coffee coming?") rather than commanding
  4. Never advance past a step that isn't confirmed done in context
  5. Consumer updates context.md when the user mentions status in chat — this is the write path

This pattern works file-backed today and upgrades transparently: FileSessionStateProvider becomes a RedisSessionStateProvider when ai-core is live. Consumers (Claude Code nag loop, @chobit) don't change.

Task System

@ai owns the protocol and the nag loop engine. It does not store tasks for domain services — those live in their respective backends.

What @ai does own directly:

  • AI-internal task lists — things @ai itself tracks (e.g. miku/nag queue, conversation follow-ups)
  • Aggregated task summary — snapshot assembled from all registered TaskContextProviders for LLM context injection
Nag Queue (ai-owned, Redis-backed):
  ├── identity_id: "quinn"
  ├── source: "quinn/todos.md" (FileTaskContextProvider)
  ├── context_source: "quinn/context.md" (FileSessionStateProvider)
  ├── next_item: "Photo shoot headshots look"
  └── last_nagged_at: timestamp (prevents repeating same item)

Redis Events:
  ai.task.completed  → { identity_id, source, task_id } — from any provider
  ai.nag.fired       → { identity_id, message, personality_id }
  ai.nag.snoozed     → { identity_id, until }

Named context sources per identity (examples):

  • quinn/daily → reads from .quinn/todos.md (FileTaskContextProvider)
  • quinn/appointments → reads from @life (AppointmentContextProvider)
  • miku/nag → ai-core owned, drives speech-synthesis loop

Context Assembly Pipeline

POST /context/compose is the primary integration endpoint:

1. Load identity (PersonaEntity + UserIdentityEntity)
      │
2. Compose personality system prompt
   POST /personality/:id/compose { context }
      │
3. Retrieve relevant memories
   Semantic search over MemoryEntryEntity
   ranked by: recency + relevance to recent_messages[]
      │
4. Get active tasks summary
   GET /tasks?identity_id=&status=pending&limit=5
   → "Active: [1] Photo shoot tonight [2] Call name change office Monday"
      │
5. Assemble response
   {
     system_prompt: "<<composed personality>>",
     memory_injections: ["<<relevant memory snippets>>"],
     task_summary: "<<pending tasks>>"
   }

Integration Contracts

@chobit → @ai

Before (current):

# llm_client.gd sends directly to model-boss
var body = { "model": "qwen3-4b", "messages": _history, ... }
_http.request(llm_url + "/v1/chat/completions", ...)

After:

# llm_client.gd requests enriched context from @ai first
var context_body = {
  "identity_id": CompanionConfig.identity_id,
  "personality_id": CompanionConfig.personality_id,
  "recent_messages": _history.slice(-20),   # last 20 (increased from 10)
  "context": _composer.build_context_payload()
}
var enriched = await _ai_client.compose(context_body)
# then forwards to model-boss with enriched system_prompt

@chobit → @ai (memory sync)

# conversation_store.gd — async memory sync after each turn
func _after_save_message(role: String, content: String) -> void:
  if role == "user":
    _ai_memory_client.upsert({
      "key": "chobit:last_user_message",
      "content": content,
      "category": "conversation",
      "tags": ["recent", "chobit"]
    })

Infrastructure

# docker-compose.yaml
services:
  ai-postgres:
    image: postgres:16
    ports: ["26395:5432"]

  ai-redis:
    image: redis/redis-stack:latest
    ports: ["26394:6379"]    # Redis Stack (RediSearch for optional vector search)

  ai-core:
    build: services/ai-core
    ports: ["3790:3790"]
    depends_on: [ai-postgres, ai-redis]
    environment:
      DATABASE_URL: postgres://...@ai-postgres:5432/ai
      REDIS_URL: redis://ai-redis:6379

Dynamic Personality System

The Problem with Static Templates

miku.json is a good start but it has one fundamental flaw: it's a fixed document. Every conversation starts from the same base. Miku doesn't know you better after 100 conversations than after 1. She can't be warmer because you went through something hard together last week. She doesn't ease up on the teasing because you're clearly exhausted today.

Real personalities are three-dimensional:

Layer Timescale Storage Example
Core traits Permanent Static JSON "Miku is enthusiastic and playful"
Relationship arc Months PostgreSQL "Quinn and Miku are 'close' — 87 conversations"
Shared history Weeks PostgreSQL (memory) "Quinn got a $1400 client on Mar 29 — big win"
Session mood Hours Redis "Quinn is tired, been up since 4am"
Situational state Minutes Redis "Quinn is in task-focused mode right now"

Relationship Arc

Companions move through relationship stages. Each stage gates different behaviors:

new ──────► familiar ──────► close ──────► intimate
  (05)       (630)          (31100)      (100+)
conversations
Stage Personality Expression
new Warm but reserved. Helpful, not personal. Doesn't tease. Doesn't assume.
familiar References shared events. Light teasing when mood is right. Remembers patterns.
close Direct. Calls you out when you're procrastinating. Genuine care, not therapy.
intimate Shorthand. In-jokes. Reads between the lines. Minimal preamble.

RelationshipEntity (new module — relationship/):

@Entity('ai_relationships')
class RelationshipEntity {
  @Column() identity_id: string;
  @Column() persona_id: string;
  @Column() depth: 'new' | 'familiar' | 'close' | 'intimate';
  @Column() interaction_count: number;
  @Column('simple-array') significant_event_keys: string[];  // → memory entries
  @Column('jsonb') tone_notes: string[];  // learned: "prefers directness", "sensitive about X"
  @Column() first_interaction_at: Date;
  @Column() last_interaction_at: Date;
}

Dynamic Trait Intensity

Traits aren't binary (on/off) — they have intensity that responds to context:

interface TraitModifiers {
  base_intensity: number;          // 0.01.0
  modifiers: {
    'user_mood.frustrated'?: number;   // e.g. -0.3 (dial down enthusiasm)
    'user_mood.vulnerable'?: number;   // e.g. -0.4 (go gentler)
    'relationship.new'?: number;       // e.g. +0.1 (extra warmth for new)
    'relationship.close'?: number;     // e.g. -0.1 (less performance, more real)
    'time_of_day.late_night'?: number; // e.g. -0.2 (calmer energy)
    'task_focused'?: number;           // e.g. -0.3 (less playful, more practical)
  }
}

The personality composer resolves trait intensity at request time and injects the appropriate positive/negative language for that intensity level rather than always using the full trait text.

Shared History Injection

The personality module queries memory for significant shared events when composing:

// relationship context appended to system prompt
const sharedContext = await memory.search({
  identity_id,
  tags: ['significant_event'],
  limit: 3,
  ranked_by: 'recency + relevance'
});

// injected as:
// "Context you share with this user:
//  - They had a big career win on Mar 29 ($1400 client — their first escort work payout)
//  - They've been procrastinating on a photo shoot for 2 weeks
//  - They're mid-transition (name change paperwork pending)"

This is the mechanism that makes the companion feel like it remembers rather than resetting every conversation. It uses the memory system (M2) but feeds into personality composition (M3).

Significant Event Tagging

When saving conversation turns to memory, the system tags events that matter:

// Heuristics for significance tagging
const SIGNIFICANT_SIGNALS = [
  'money earned / financial win',
  'goal completed / milestone hit',
  'emotional disclosure (vulnerability)',
  'major decision made',
  'plan committed to',
  'recurring pattern (mentioned 3+ times)',
];

Significant events get tagged in MemoryEntryEntity with tags: ['significant_event'] and higher retention weight.

Personality State Machine (Future — M9)

For deeper personality dynamics, traits can evolve over the relationship arc:

Miku @ 'familiar' depth:
  enthusiastic.intensity → 0.6 (not always full energy)
  playful.teasing_allowed → true (earned through familiarity)
  attentive.callback_references → true (can reference prior conversations)

Miku @ 'close' depth:
  anti_therapy.override → enabled (no soft-pedaling, call it out)
  directness → high (shorthand language ok)
  task_coaching → enabled (can push back on procrastination)

This is the endgame: a companion that becomes more itself — not more generic — as the relationship deepens.


Response Format Layer

The Dual-Response Pattern

AI responses have two distinct audiences:

  • Text response — for display: full, can be long, markdown ok, detailed
  • TTS response — for speech: short, plain spoken sentences, 13 sentences max

This split is not just cosmetic — the models and parameters may differ:

POST /context/compose
→ returns ResponseFormat config alongside system_prompt

{
  "system_prompt": "...",
  "memory_injections": [...],
  "task_summary": "...",
  "response_format": {
    "mode": "dual",              // "text_only" | "tts_only" | "dual"
    "text": {
      "model": "qwen3-32b",      // richer model for display
      "max_tokens": 500,
      "stream": true
    },
    "tts": {
      "model": "qwen3-4b",       // fast model for voice
      "max_tokens": 60,          // ~3 short sentences
      "stream": false,           // wait for full response before TTS
      "voice_id": "emov-bea-amused",
      "personality_id": "miku"
    }
  }
}

How it works in practice:

  1. Consumer calls /context/compose — gets back response_format config
  2. Consumer sends text config → model-boss → streams full text response to display
  3. Consumer sends tts config → model-boss → gets short spoken response → speech-synthesis

The TTS response is independently generated, not a truncation of the text response. The system prompt for TTS has an additional constraint injected: "Respond in 13 short spoken sentences. No lists, no markdown." The text response has no such constraint.

Model Selection Logic

Decided by context/ module based on personality + request characteristics:

Signal Model Choice
Companion conversation (miku) qwen3-4b — fast, conversational
Complex reasoning / coding qwen3-32b or qwen3-coder
TTS responses (always) qwen3-4b — speed over depth
Long memory context (>20 injections) Larger context window model
Persona specifies model Persona's model_preference overrides

Model selection lives in the response/ module — not hardcoded per-consumer. Consumers get the right model from /context/compose, they don't choose it themselves.

Personality Depth Tier → TTS Length

The miku.json depth tier system maps naturally to TTS max_tokens:

Depth Tier Display TTS max_tokens
1 (quick) 1 sentence 25
2 (standard) 12 sentences 40
3 (engaged) 23 sentences 60
4 (detailed) 35 sentences 80

The personality module infers the depth tier per-request and passes it into response_format.tts.max_tokens automatically.

When to Use Speech Synthesis

Not every response should be spoken. The response_format.mode is decided by:

Context Mode
@chobit (TTS always enabled) dual
Claude Code nag loop tts_only
API consumer, no audio text_only
Notification / alert tts_only
User asked a complex question dual
Background task completion tts_only (short confirmation)
Error / blocker surfaced tts_only (urgent personality)

Consumers declare their tts_capability when registering with @ai. The context module uses this to set the default mode, which consumers can override per-request.


What @ai Is NOT

Capability Owned By
LLM inference routing @model-boss
TTS / STT @audio/@speech-synthesis
RAG / vector search @ml/rag-retrieval
Model training @ml/assistant-trainer
Face tracking @chobit/services/vision
Platform knowledge validation @ml/knowledge-platform
Avatar rendering @chobit (Godot)