ai/docs/ARCHITECTURE.md

502 lines
20 KiB
Markdown
Raw Permalink Normal View History

# @ai — Architecture
## System Diagram
```
┌─────────────────────────────────────────────────────────────────────┐
@ai / ai-core │
│ NestJS :3790 │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌────────────────────────┐ │
│ │ identity/ │ │ memory/ │ │ personality/ │ │
│ │ │ │ │ │ │ │
│ │ PersonaEntity│ │ MemoryEntry │ │ JSON template loader │ │
│ │ UserIdentity │ │ Redis cache │ │ Prompt composer │ │
│ │ │ │ PG fallback │ │ Context-aware assembly│ │
│ └──────────────┘ └──────────────┘ └────────────────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────────────────────────────────┐ │
│ │ tasks/ │ │ context/ │ │
│ │ │ │ │ │
│ │ TaskList │ │ POST /context/compose │ │
│ │ Task │ │ ┌─────────────────────────────────┐ │ │
│ │ Redis events │ │ │ identity → personality → memory │ │ │
│ │ │ │ │ → tasks → composed system prompt│ │ │
│ └──────────────┘ │ └─────────────────────────────────┘ │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌──────────────┐
@chobit │ │ @life │ │ @kthulu
│ │ │ │ │ │
│ Godot 4 │ │ NestJS │ │ CLI + NestJS │
│ VRM avatar │ │ Life platform│ │ Coding agent │
└────────────┘ └──────────────┘ └──────────────┘
```
## Memory Architecture
Two-tier storage inherited from `@life` and `@ml/knowledge-platform`:
```
Request Redis (short-term) PostgreSQL (long-term)
│ TTL: 1 hour Permanent
├─── GET /memory ──► cache hit? ──────────────────────────────────►
│ │ miss │
│ └──────────────────────────────────────► │
│ │
├─── POST /memory ──────────────────────────────────────────────► write both
└─── DELETE /memory ──► invalidate Redis key → soft-delete in PG
```
**MemoryEntry schema** (from `@life/platform-ai`):
```typescript
@Entity('ai_memory_entries')
class MemoryEntryEntity {
@PrimaryGeneratedColumn('uuid') id: string;
@Column({ unique: true }) key: string;
@Column('text') content: string;
@Column({ nullable: true }) category: string;
@Column('simple-array') tags: string[];
@Column('jsonb', { default: {} }) metadata: Record<string, unknown>;
@Column({ default: false }) deleted: boolean;
@CreateDateColumn() created_at: Date;
@UpdateDateColumn() updated_at: Date;
}
```
## Personality System
Inherits the composable template format from `@chobit/godot-desktop/config/personalities/miku.json`.
Composition order (per-request, not static):
```
identity.base
+ identity.voice_constraint
+ active_traits[].positive
+ active_negatives[]
+ emotion_tags.instruction
+ depth_tier[inferred_tier].instruction
+ context_modifiers[time_of_day]
+ context_modifiers[conversation_depth]
+ context_modifiers[user_mood_signals[detected_mood]]
+ situation_overrides[detected_situations[]]
+ active_traits[].negative
```
Input context payload:
```typescript
interface PersonalityContext {
time_of_day: 'morning' | 'afternoon' | 'evening' | 'late_night';
conversation_depth: 'shallow' | 'mid' | 'deep';
user_mood: 'frustrated' | 'casual' | 'task_focused' | 'vulnerable';
situations: string[]; // detected from recent message content
tts_active: boolean;
message_count: number; // for depth_tier inference
last_user_message: string;
}
```
## Context Provider Pattern
**@ai does not own task or appointment data. It defines the protocol.**
Domain services implement `ContextProvider` interfaces. @ai aggregates their output into the context assembly pipeline. The data always lives in the service that owns it.
```typescript
// @ai defines the interfaces
interface TaskContextProvider {
getActiveTasks(identity_id: string, options: TaskQueryOptions): Promise<TaskSummary[]>;
}
interface AppointmentContextProvider {
getUpcomingAppointments(identity_id: string, window_hours: number): Promise<AppointmentSummary[]>;
}
interface SessionStateProvider {
getSessionState(identity_id: string): Promise<SessionState>;
}
interface SessionState {
current_status: string; // "coffee brewing", "shoot in progress", etc.
completed_today: string[]; // items confirmed done this session
last_updated: Date;
}
// Domain services implement it:
// @life → AppointmentContextProvider (calendar, scheduling, events)
// .quinn/todos.md → FileTaskContextProvider (file-backed ordered task list)
// .quinn/context.md → FileSessionStateProvider (file-backed live session state)
// @kthulu → CodeTaskContextProvider (coding session tasks)
```
Context providers register with ai-core on startup. The `context/` module queries all registered providers and assembles their output into the system prompt.
```
ai.context.providers.register → { provider_id, type, endpoint }
ai.context.providers.query → fanout to all registered providers
ai.context.assembled → composed result emitted on Redis
```
### Two-File Nag Pattern
The nag loop consumes two distinct sources:
| Source | Interface | Contains | Example |
|--------|-----------|----------|---------|
| `todos.md` | `FileTaskContextProvider` | Ordered pending tasks | "Headshots → Casual → Glamour → Platforms" |
| `context.md` | `FileSessionStateProvider` | Live session state | "Coffee brewing, shoot not started" |
**Nag loop behavior:**
1. Find the earliest incomplete task in the ordered list
2. Check session state — has this step actually started or completed?
3. If state is ambiguous → ask a **check-in question** ("How's the coffee coming?") rather than commanding
4. Never advance past a step that isn't confirmed done in context
5. Consumer updates `context.md` when the user mentions status in chat — this is the write path
This pattern works file-backed today and upgrades transparently: `FileSessionStateProvider` becomes a `RedisSessionStateProvider` when ai-core is live. Consumers (Claude Code nag loop, @chobit) don't change.
## Task System
@ai owns the **protocol** and the **nag loop engine**. It does not store tasks for domain services — those live in their respective backends.
What @ai does own directly:
- **AI-internal task lists** — things @ai itself tracks (e.g. `miku/nag` queue, conversation follow-ups)
- **Aggregated task summary** — snapshot assembled from all registered `TaskContextProvider`s for LLM context injection
```
Nag Queue (ai-owned, Redis-backed):
├── identity_id: "quinn"
├── source: "quinn/todos.md" (FileTaskContextProvider)
├── context_source: "quinn/context.md" (FileSessionStateProvider)
├── next_item: "Photo shoot headshots look"
└── last_nagged_at: timestamp (prevents repeating same item)
Redis Events:
ai.task.completed → { identity_id, source, task_id } — from any provider
ai.nag.fired → { identity_id, message, personality_id }
ai.nag.snoozed → { identity_id, until }
```
Named context sources per identity (examples):
- `quinn/daily` → reads from `.quinn/todos.md` (FileTaskContextProvider)
- `quinn/appointments` → reads from @life (AppointmentContextProvider)
- `miku/nag` → ai-core owned, drives speech-synthesis loop
## Context Assembly Pipeline
`POST /context/compose` is the primary integration endpoint:
```
1. Load identity (PersonaEntity + UserIdentityEntity)
2. Compose personality system prompt
POST /personality/:id/compose { context }
3. Retrieve relevant memories
Semantic search over MemoryEntryEntity
ranked by: recency + relevance to recent_messages[]
4. Get active tasks summary
GET /tasks?identity_id=&status=pending&limit=5
→ "Active: [1] Photo shoot tonight [2] Call name change office Monday"
5. Assemble response
{
system_prompt: "<<composed personality>>",
memory_injections: ["<<relevant memory snippets>>"],
task_summary: "<<pending tasks>>"
}
```
## Integration Contracts
### @chobit → @ai
**Before (current):**
```gdscript
# llm_client.gd sends directly to model-boss
var body = { "model": "qwen3-4b", "messages": _history, ... }
_http.request(llm_url + "/v1/chat/completions", ...)
```
**After:**
```gdscript
# llm_client.gd requests enriched context from @ai first
var context_body = {
"identity_id": CompanionConfig.identity_id,
"personality_id": CompanionConfig.personality_id,
"recent_messages": _history.slice(-20), # last 20 (increased from 10)
"context": _composer.build_context_payload()
}
var enriched = await _ai_client.compose(context_body)
# then forwards to model-boss with enriched system_prompt
```
### @chobit → @ai (memory sync)
```gdscript
# conversation_store.gd — async memory sync after each turn
func _after_save_message(role: String, content: String) -> void:
if role == "user":
_ai_memory_client.upsert({
"key": "chobit:last_user_message",
"content": content,
"category": "conversation",
"tags": ["recent", "chobit"]
})
```
## Infrastructure
```yaml
# docker-compose.yaml
services:
ai-postgres:
image: postgres:16
ports: ["26395:5432"]
ai-redis:
image: redis/redis-stack:latest
ports: ["26394:6379"] # Redis Stack (RediSearch for optional vector search)
ai-core:
build: services/ai-core
ports: ["3790:3790"]
depends_on: [ai-postgres, ai-redis]
environment:
DATABASE_URL: postgres://...@ai-postgres:5432/ai
REDIS_URL: redis://ai-redis:6379
```
## Dynamic Personality System
### The Problem with Static Templates
`miku.json` is a good start but it has one fundamental flaw: it's a **fixed document**. Every conversation starts from the same base. Miku doesn't know you better after 100 conversations than after 1. She can't be warmer because you went through something hard together last week. She doesn't ease up on the teasing because you're clearly exhausted today.
Real personalities are three-dimensional:
| Layer | Timescale | Storage | Example |
|-------|-----------|---------|---------|
| **Core traits** | Permanent | Static JSON | "Miku is enthusiastic and playful" |
| **Relationship arc** | Months | PostgreSQL | "Quinn and Miku are 'close' — 87 conversations" |
| **Shared history** | Weeks | PostgreSQL (memory) | "Quinn got a $1400 client on Mar 29 — big win" |
| **Session mood** | Hours | Redis | "Quinn is tired, been up since 4am" |
| **Situational state** | Minutes | Redis | "Quinn is in task-focused mode right now" |
### Relationship Arc
Companions move through relationship stages. Each stage gates different behaviors:
```
new ──────► familiar ──────► close ──────► intimate
(05) (630) (31100) (100+)
conversations
```
| Stage | Personality Expression |
|-------|----------------------|
| `new` | Warm but reserved. Helpful, not personal. Doesn't tease. Doesn't assume. |
| `familiar` | References shared events. Light teasing when mood is right. Remembers patterns. |
| `close` | Direct. Calls you out when you're procrastinating. Genuine care, not therapy. |
| `intimate` | Shorthand. In-jokes. Reads between the lines. Minimal preamble. |
**RelationshipEntity** (new module — `relationship/`):
```typescript
@Entity('ai_relationships')
class RelationshipEntity {
@Column() identity_id: string;
@Column() persona_id: string;
@Column() depth: 'new' | 'familiar' | 'close' | 'intimate';
@Column() interaction_count: number;
@Column('simple-array') significant_event_keys: string[]; // → memory entries
@Column('jsonb') tone_notes: string[]; // learned: "prefers directness", "sensitive about X"
@Column() first_interaction_at: Date;
@Column() last_interaction_at: Date;
}
```
### Dynamic Trait Intensity
Traits aren't binary (on/off) — they have intensity that responds to context:
```typescript
interface TraitModifiers {
base_intensity: number; // 0.01.0
modifiers: {
'user_mood.frustrated'?: number; // e.g. -0.3 (dial down enthusiasm)
'user_mood.vulnerable'?: number; // e.g. -0.4 (go gentler)
'relationship.new'?: number; // e.g. +0.1 (extra warmth for new)
'relationship.close'?: number; // e.g. -0.1 (less performance, more real)
'time_of_day.late_night'?: number; // e.g. -0.2 (calmer energy)
'task_focused'?: number; // e.g. -0.3 (less playful, more practical)
}
}
```
The personality composer resolves trait intensity at request time and injects the appropriate positive/negative language for that intensity level rather than always using the full trait text.
### Shared History Injection
The personality module queries memory for **significant shared events** when composing:
```typescript
// relationship context appended to system prompt
const sharedContext = await memory.search({
identity_id,
tags: ['significant_event'],
limit: 3,
ranked_by: 'recency + relevance'
});
// injected as:
// "Context you share with this user:
// - They had a big career win on Mar 29 ($1400 client — their first escort work payout)
// - They've been procrastinating on a photo shoot for 2 weeks
// - They're mid-transition (name change paperwork pending)"
```
This is the mechanism that makes the companion feel like it *remembers* rather than resetting every conversation. It uses the memory system (M2) but feeds into personality composition (M3).
### Significant Event Tagging
When saving conversation turns to memory, the system tags events that matter:
```typescript
// Heuristics for significance tagging
const SIGNIFICANT_SIGNALS = [
'money earned / financial win',
'goal completed / milestone hit',
'emotional disclosure (vulnerability)',
'major decision made',
'plan committed to',
'recurring pattern (mentioned 3+ times)',
];
```
Significant events get tagged in `MemoryEntryEntity` with `tags: ['significant_event']` and higher retention weight.
### Personality State Machine (Future — M9)
For deeper personality dynamics, traits can evolve over the relationship arc:
```
Miku @ 'familiar' depth:
enthusiastic.intensity → 0.6 (not always full energy)
playful.teasing_allowed → true (earned through familiarity)
attentive.callback_references → true (can reference prior conversations)
Miku @ 'close' depth:
anti_therapy.override → enabled (no soft-pedaling, call it out)
directness → high (shorthand language ok)
task_coaching → enabled (can push back on procrastination)
```
This is the endgame: a companion that becomes more *itself* — not more generic — as the relationship deepens.
---
## Response Format Layer
### The Dual-Response Pattern
AI responses have two distinct audiences:
- **Text response** — for display: full, can be long, markdown ok, detailed
- **TTS response** — for speech: short, plain spoken sentences, 13 sentences max
This split is not just cosmetic — the *models and parameters* may differ:
```
POST /context/compose
→ returns ResponseFormat config alongside system_prompt
{
"system_prompt": "...",
"memory_injections": [...],
"task_summary": "...",
"response_format": {
"mode": "dual", // "text_only" | "tts_only" | "dual"
"text": {
"model": "qwen3-32b", // richer model for display
"max_tokens": 500,
"stream": true
},
"tts": {
"model": "qwen3-4b", // fast model for voice
"max_tokens": 60, // ~3 short sentences
"stream": false, // wait for full response before TTS
"voice_id": "emov-bea-amused",
"personality_id": "miku"
}
}
}
```
**How it works in practice:**
1. Consumer calls `/context/compose` — gets back `response_format` config
2. Consumer sends `text` config → model-boss → streams full text response to display
3. Consumer sends `tts` config → model-boss → gets short spoken response → speech-synthesis
The TTS response is **independently generated**, not a truncation of the text response. The system prompt for TTS has an additional constraint injected: `"Respond in 13 short spoken sentences. No lists, no markdown."` The text response has no such constraint.
### Model Selection Logic
Decided by `context/` module based on personality + request characteristics:
| Signal | Model Choice |
|--------|-------------|
| Companion conversation (miku) | `qwen3-4b` — fast, conversational |
| Complex reasoning / coding | `qwen3-32b` or `qwen3-coder` |
| TTS responses (always) | `qwen3-4b` — speed over depth |
| Long memory context (>20 injections) | Larger context window model |
| Persona specifies model | Persona's `model_preference` overrides |
Model selection lives in the `response/` module — **not** hardcoded per-consumer. Consumers get the right model from `/context/compose`, they don't choose it themselves.
### Personality Depth Tier → TTS Length
The `miku.json` depth tier system maps naturally to TTS max_tokens:
| Depth Tier | Display | TTS max_tokens |
|-----------|---------|----------------|
| 1 (quick) | 1 sentence | 25 |
| 2 (standard) | 12 sentences | 40 |
| 3 (engaged) | 23 sentences | 60 |
| 4 (detailed) | 35 sentences | 80 |
The personality module infers the depth tier per-request and passes it into `response_format.tts.max_tokens` automatically.
### When to Use Speech Synthesis
Not every response should be spoken. The `response_format.mode` is decided by:
| Context | Mode |
|---------|------|
| @chobit (TTS always enabled) | `dual` |
| Claude Code nag loop | `tts_only` |
| API consumer, no audio | `text_only` |
| Notification / alert | `tts_only` |
| User asked a complex question | `dual` |
| Background task completion | `tts_only` (short confirmation) |
| Error / blocker surfaced | `tts_only` (urgent personality) |
Consumers declare their `tts_capability` when registering with @ai. The context module uses this to set the default `mode`, which consumers can override per-request.
---
## What @ai Is NOT
| Capability | Owned By |
|-----------|---------|
| LLM inference routing | `@model-boss` |
| TTS / STT | `@audio/@speech-synthesis` |
| RAG / vector search | `@ml/rag-retrieval` |
| Model training | `@ml/assistant-trainer` |
| Face tracking | `@chobit/services/vision` |
| Platform knowledge validation | `@ml/knowledge-platform` |
| Avatar rendering | `@chobit` (Godot) |