imajin/docs/architecture/data-flow.md
Lilith a5f99bb3d7 chore(imajin): clean up legacy structure and completion markers
- Remove old imajin/ directory (migrated to services/ + orchestrators/)
- Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md)
- Remove standalone test generation scripts
- Update docs to reflect current architecture
- Add multi-base-strategy.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 17:01:10 -08:00

242 lines
6.2 KiB
Markdown

# Data Flow
## End-to-End Image Generation
The typical request flows through multiple services:
```mermaid
sequenceDiagram
participant User
participant UI as imajin-app
participant Assist as imajin-prompt
participant Gen as imajin-diffusion
participant Proc as imajin-processing
participant GPU
User->>UI: Enter prompt description
UI->>Assist: POST /analyze-context
Note over Assist: Stage 1: Cultural Classification
Assist->>GPU: Load classifier
GPU-->>Assist: Classification result
Note over Assist: Stage 2: LLM Reasoning
Assist->>GPU: Load DeepSeek R1 70B
GPU-->>Assist: Generated prompts
Assist-->>UI: GenerationConfig + prompts
User->>UI: Select prompts, click Generate
UI->>Gen: POST /generate/async
Gen-->>UI: { jobId: "abc123" }
loop Poll Status
UI->>Gen: GET /jobs/abc123
Gen-->>UI: { status: "processing" }
end
Note over Gen: Diffusion Model Inference
Gen->>GPU: Load diffusion model
GPU-->>Gen: Generated image
UI->>Gen: GET /jobs/abc123/result
Gen-->>UI: { imageData: "base64..." }
opt Post-Processing
UI->>Proc: POST /derivatives
Proc-->>UI: Processed variants
end
UI-->>User: Display final image
```
## Request Types
### 1. Prompt Generation Flow
**Entry**: `POST /analyze-context` (imajin-prompt)
```
User Input (category, filters)
Cultural Classifier (fast, rule-based)
LLM Reasoning (DeepSeek R1 70B)
GenerationConfig + Image Prompts
```
**Duration**: 15-60 seconds (LLM inference)
### 2. Image Generation Flow
**Entry**: `POST /generate` or `POST /generate/async` (imajin-diffusion)
```
Image Prompt + Parameters
Model Selection (photorealistic/anime)
Diffusion Inference Pipeline
Optional: Text Overlay
Optional: Watermark
Optional: Moderation
Base64 Image Output
```
**Duration**: 5-30 seconds (depends on resolution)
### 3. Post-Processing Flow (Integrated)
**Entry**: `POST /process` (imajin-processing)
**Default Pipeline** (used by imajin orchestrator):
```
Base64 PNG Input (from SDXL)
Optimize (WebP quality 82)
Convert to WebP (quality 90)
Generate Derivatives (family-based responsive variants)
Processed Image + Derivatives + Metadata
```
**Available Operations**:
- `sanitize` - Strip metadata, validate (for user-uploaded images only)
- `optimize` - WebP conversion with balanced preset
- `convert-webp` - High-quality WebP conversion
- `derivatives` - Generate responsive image variants
**Integration**: The main orchestrator (`orchestrators/imajin-app/src/imajin_app/main.py`) automatically processes generated images unless `skip_processing=true`.
**Duration**: 1-5 seconds (depends on resolution and derivative count)
### 4. Batch Multi-Size Generation Flow
**Entry**: `POST /generate/batch-sizes` (imajin orchestrator)
```mermaid
sequenceDiagram
participant Consumer
participant Orchestrator as imajin (main.py)
participant Strategy as BaseImageStrategy
participant VRAMBoss as vram-boss
participant Diffusion as imajin-diffusion
participant Focal as FocalPointDetector
participant Processing as imajin-processing
Consumer->>Orchestrator: POST /generate/batch-sizes
Note over Orchestrator: { sizes: ["hero", "og", "sidebar"] }
Orchestrator-->>Consumer: { job_id: "...", status: "queued" }
Note over Strategy: Analyze sizes, group by aspect
Strategy-->>Orchestrator: Need 2 bases: landscape, portrait
loop For each base needed
Orchestrator->>VRAMBoss: Acquire GPU lease
VRAMBoss-->>Orchestrator: Lease granted
Orchestrator->>Diffusion: Generate base (seed=X, layout=Y)
Diffusion-->>Orchestrator: Base image
VRAMBoss-->>Orchestrator: Lease released
end
loop For each base generated
Orchestrator->>Focal: Detect focal point
Focal-->>Orchestrator: FocalPoint(x, y)
end
loop For each requested size
Orchestrator->>Processing: POST /derivatives/clip-focal
Note over Processing: Crop with focal point preservation
Processing-->>Orchestrator: Cropped derivative
end
Consumer->>Orchestrator: GET /jobs/{job_id}
Orchestrator-->>Consumer: { status: "completed", images: {...} }
```
**Batch Pipeline Stages**:
```
BatchSizesRequest { sizes[], seed?, priority }
Stage 1: AnalyzeSizesStage
→ Determine minimal bases needed (landscape/square/portrait)
→ Generate or use provided seed
Stage 2: GenerateBasesStage
→ Acquire GPU lease via vram-boss
→ Generate each base with consistent seed
→ Same "person" across all bases
Stage 3: DetectFocalPointsStage
→ MediaPipe face detection per base
→ Fallback to center if no face
Stage 4: CropDerivativesStage
→ Crop bases to requested sizes
→ Preserve focal point in crop region
BatchSizesResponse { images, bases_generated, seed }
```
**Key Benefits**:
- **Visual Coherence**: Same seed = same "person" across all sizes
- **Efficiency**: 4 sizes from 2 bases instead of 4 separate generations
- **Smart Cropping**: Faces preserved via focal point detection
**Duration**: 8-15 seconds (vs 20-40s generating each independently)
**See Also**: [Multi-Base Strategy](./multi-base-strategy.md) for full implementation details.
## Data Formats
### Image Data
All image data is transmitted as base64-encoded strings:
```typescript
interface GenerateResponse {
imageData: string; // base64-encoded PNG/WebP
format: 'png' | 'webp';
width: number;
height: number;
}
```
### Prompt Data
```typescript
interface ParsedPrompt {
name: string; // Human-readable identifier
prompt: string; // Positive image prompt
negativePrompt: string; // Negative image prompt
}
```
## Error Propagation
Errors bubble up through the service chain:
```mermaid
graph LR
GPU[GPU OOM] --> GEN[imajin-diffusion 500]
GEN --> UI[UI Error State]
LLM[LLM Timeout] --> ASSIST[imajin-prompt 500]
ASSIST --> UI
```
All services return structured error responses:
```json
{
"error": "GPU out of memory",
"code": "GPU_OOM",
"details": { "requested": "8GB", "available": "4GB" }
}
```