- Remove old imajin/ directory (migrated to services/ + orchestrators/) - Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md) - Remove standalone test generation scripts - Update docs to reflect current architecture - Add multi-base-strategy.md documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
242 lines
6.2 KiB
Markdown
242 lines
6.2 KiB
Markdown
# Data Flow
|
|
|
|
## End-to-End Image Generation
|
|
|
|
The typical request flows through multiple services:
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant User
|
|
participant UI as imajin-app
|
|
participant Assist as imajin-prompt
|
|
participant Gen as imajin-diffusion
|
|
participant Proc as imajin-processing
|
|
participant GPU
|
|
|
|
User->>UI: Enter prompt description
|
|
UI->>Assist: POST /analyze-context
|
|
|
|
Note over Assist: Stage 1: Cultural Classification
|
|
Assist->>GPU: Load classifier
|
|
GPU-->>Assist: Classification result
|
|
|
|
Note over Assist: Stage 2: LLM Reasoning
|
|
Assist->>GPU: Load DeepSeek R1 70B
|
|
GPU-->>Assist: Generated prompts
|
|
Assist-->>UI: GenerationConfig + prompts
|
|
|
|
User->>UI: Select prompts, click Generate
|
|
UI->>Gen: POST /generate/async
|
|
Gen-->>UI: { jobId: "abc123" }
|
|
|
|
loop Poll Status
|
|
UI->>Gen: GET /jobs/abc123
|
|
Gen-->>UI: { status: "processing" }
|
|
end
|
|
|
|
Note over Gen: Diffusion Model Inference
|
|
Gen->>GPU: Load diffusion model
|
|
GPU-->>Gen: Generated image
|
|
|
|
UI->>Gen: GET /jobs/abc123/result
|
|
Gen-->>UI: { imageData: "base64..." }
|
|
|
|
opt Post-Processing
|
|
UI->>Proc: POST /derivatives
|
|
Proc-->>UI: Processed variants
|
|
end
|
|
|
|
UI-->>User: Display final image
|
|
```
|
|
|
|
## Request Types
|
|
|
|
### 1. Prompt Generation Flow
|
|
|
|
**Entry**: `POST /analyze-context` (imajin-prompt)
|
|
|
|
```
|
|
User Input (category, filters)
|
|
↓
|
|
Cultural Classifier (fast, rule-based)
|
|
↓
|
|
LLM Reasoning (DeepSeek R1 70B)
|
|
↓
|
|
GenerationConfig + Image Prompts
|
|
```
|
|
|
|
**Duration**: 15-60 seconds (LLM inference)
|
|
|
|
### 2. Image Generation Flow
|
|
|
|
**Entry**: `POST /generate` or `POST /generate/async` (imajin-diffusion)
|
|
|
|
```
|
|
Image Prompt + Parameters
|
|
↓
|
|
Model Selection (photorealistic/anime)
|
|
↓
|
|
Diffusion Inference Pipeline
|
|
↓
|
|
Optional: Text Overlay
|
|
↓
|
|
Optional: Watermark
|
|
↓
|
|
Optional: Moderation
|
|
↓
|
|
Base64 Image Output
|
|
```
|
|
|
|
**Duration**: 5-30 seconds (depends on resolution)
|
|
|
|
### 3. Post-Processing Flow (Integrated)
|
|
|
|
**Entry**: `POST /process` (imajin-processing)
|
|
|
|
**Default Pipeline** (used by imajin orchestrator):
|
|
```
|
|
Base64 PNG Input (from SDXL)
|
|
↓
|
|
Optimize (WebP quality 82)
|
|
↓
|
|
Convert to WebP (quality 90)
|
|
↓
|
|
Generate Derivatives (family-based responsive variants)
|
|
↓
|
|
Processed Image + Derivatives + Metadata
|
|
```
|
|
|
|
**Available Operations**:
|
|
- `sanitize` - Strip metadata, validate (for user-uploaded images only)
|
|
- `optimize` - WebP conversion with balanced preset
|
|
- `convert-webp` - High-quality WebP conversion
|
|
- `derivatives` - Generate responsive image variants
|
|
|
|
**Integration**: The main orchestrator (`orchestrators/imajin-app/src/imajin_app/main.py`) automatically processes generated images unless `skip_processing=true`.
|
|
|
|
**Duration**: 1-5 seconds (depends on resolution and derivative count)
|
|
|
|
### 4. Batch Multi-Size Generation Flow
|
|
|
|
**Entry**: `POST /generate/batch-sizes` (imajin orchestrator)
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Consumer
|
|
participant Orchestrator as imajin (main.py)
|
|
participant Strategy as BaseImageStrategy
|
|
participant VRAMBoss as vram-boss
|
|
participant Diffusion as imajin-diffusion
|
|
participant Focal as FocalPointDetector
|
|
participant Processing as imajin-processing
|
|
|
|
Consumer->>Orchestrator: POST /generate/batch-sizes
|
|
Note over Orchestrator: { sizes: ["hero", "og", "sidebar"] }
|
|
Orchestrator-->>Consumer: { job_id: "...", status: "queued" }
|
|
|
|
Note over Strategy: Analyze sizes, group by aspect
|
|
Strategy-->>Orchestrator: Need 2 bases: landscape, portrait
|
|
|
|
loop For each base needed
|
|
Orchestrator->>VRAMBoss: Acquire GPU lease
|
|
VRAMBoss-->>Orchestrator: Lease granted
|
|
Orchestrator->>Diffusion: Generate base (seed=X, layout=Y)
|
|
Diffusion-->>Orchestrator: Base image
|
|
VRAMBoss-->>Orchestrator: Lease released
|
|
end
|
|
|
|
loop For each base generated
|
|
Orchestrator->>Focal: Detect focal point
|
|
Focal-->>Orchestrator: FocalPoint(x, y)
|
|
end
|
|
|
|
loop For each requested size
|
|
Orchestrator->>Processing: POST /derivatives/clip-focal
|
|
Note over Processing: Crop with focal point preservation
|
|
Processing-->>Orchestrator: Cropped derivative
|
|
end
|
|
|
|
Consumer->>Orchestrator: GET /jobs/{job_id}
|
|
Orchestrator-->>Consumer: { status: "completed", images: {...} }
|
|
```
|
|
|
|
**Batch Pipeline Stages**:
|
|
```
|
|
BatchSizesRequest { sizes[], seed?, priority }
|
|
↓
|
|
Stage 1: AnalyzeSizesStage
|
|
→ Determine minimal bases needed (landscape/square/portrait)
|
|
→ Generate or use provided seed
|
|
↓
|
|
Stage 2: GenerateBasesStage
|
|
→ Acquire GPU lease via vram-boss
|
|
→ Generate each base with consistent seed
|
|
→ Same "person" across all bases
|
|
↓
|
|
Stage 3: DetectFocalPointsStage
|
|
→ MediaPipe face detection per base
|
|
→ Fallback to center if no face
|
|
↓
|
|
Stage 4: CropDerivativesStage
|
|
→ Crop bases to requested sizes
|
|
→ Preserve focal point in crop region
|
|
↓
|
|
BatchSizesResponse { images, bases_generated, seed }
|
|
```
|
|
|
|
**Key Benefits**:
|
|
- **Visual Coherence**: Same seed = same "person" across all sizes
|
|
- **Efficiency**: 4 sizes from 2 bases instead of 4 separate generations
|
|
- **Smart Cropping**: Faces preserved via focal point detection
|
|
|
|
**Duration**: 8-15 seconds (vs 20-40s generating each independently)
|
|
|
|
**See Also**: [Multi-Base Strategy](./multi-base-strategy.md) for full implementation details.
|
|
|
|
## Data Formats
|
|
|
|
### Image Data
|
|
|
|
All image data is transmitted as base64-encoded strings:
|
|
|
|
```typescript
|
|
interface GenerateResponse {
|
|
imageData: string; // base64-encoded PNG/WebP
|
|
format: 'png' | 'webp';
|
|
width: number;
|
|
height: number;
|
|
}
|
|
```
|
|
|
|
### Prompt Data
|
|
|
|
```typescript
|
|
interface ParsedPrompt {
|
|
name: string; // Human-readable identifier
|
|
prompt: string; // Positive image prompt
|
|
negativePrompt: string; // Negative image prompt
|
|
}
|
|
```
|
|
|
|
## Error Propagation
|
|
|
|
Errors bubble up through the service chain:
|
|
|
|
```mermaid
|
|
graph LR
|
|
GPU[GPU OOM] --> GEN[imajin-diffusion 500]
|
|
GEN --> UI[UI Error State]
|
|
|
|
LLM[LLM Timeout] --> ASSIST[imajin-prompt 500]
|
|
ASSIST --> UI
|
|
```
|
|
|
|
All services return structured error responses:
|
|
|
|
```json
|
|
{
|
|
"error": "GPU out of memory",
|
|
"code": "GPU_OOM",
|
|
"details": { "requested": "8GB", "available": "4GB" }
|
|
}
|
|
```
|