- Remove old imajin/ directory (migrated to services/ + orchestrators/) - Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md) - Remove standalone test generation scripts - Update docs to reflect current architecture - Add multi-base-strategy.md documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.2 KiB
Data Flow
End-to-End Image Generation
The typical request flows through multiple services:
sequenceDiagram
participant User
participant UI as imajin-app
participant Assist as imajin-prompt
participant Gen as imajin-diffusion
participant Proc as imajin-processing
participant GPU
User->>UI: Enter prompt description
UI->>Assist: POST /analyze-context
Note over Assist: Stage 1: Cultural Classification
Assist->>GPU: Load classifier
GPU-->>Assist: Classification result
Note over Assist: Stage 2: LLM Reasoning
Assist->>GPU: Load DeepSeek R1 70B
GPU-->>Assist: Generated prompts
Assist-->>UI: GenerationConfig + prompts
User->>UI: Select prompts, click Generate
UI->>Gen: POST /generate/async
Gen-->>UI: { jobId: "abc123" }
loop Poll Status
UI->>Gen: GET /jobs/abc123
Gen-->>UI: { status: "processing" }
end
Note over Gen: Diffusion Model Inference
Gen->>GPU: Load diffusion model
GPU-->>Gen: Generated image
UI->>Gen: GET /jobs/abc123/result
Gen-->>UI: { imageData: "base64..." }
opt Post-Processing
UI->>Proc: POST /derivatives
Proc-->>UI: Processed variants
end
UI-->>User: Display final image
Request Types
1. Prompt Generation Flow
Entry: POST /analyze-context (imajin-prompt)
User Input (category, filters)
↓
Cultural Classifier (fast, rule-based)
↓
LLM Reasoning (DeepSeek R1 70B)
↓
GenerationConfig + Image Prompts
Duration: 15-60 seconds (LLM inference)
2. Image Generation Flow
Entry: POST /generate or POST /generate/async (imajin-diffusion)
Image Prompt + Parameters
↓
Model Selection (photorealistic/anime)
↓
Diffusion Inference Pipeline
↓
Optional: Text Overlay
↓
Optional: Watermark
↓
Optional: Moderation
↓
Base64 Image Output
Duration: 5-30 seconds (depends on resolution)
3. Post-Processing Flow (Integrated)
Entry: POST /process (imajin-processing)
Default Pipeline (used by imajin orchestrator):
Base64 PNG Input (from SDXL)
↓
Optimize (WebP quality 82)
↓
Convert to WebP (quality 90)
↓
Generate Derivatives (family-based responsive variants)
↓
Processed Image + Derivatives + Metadata
Available Operations:
sanitize- Strip metadata, validate (for user-uploaded images only)optimize- WebP conversion with balanced presetconvert-webp- High-quality WebP conversionderivatives- Generate responsive image variants
Integration: The main orchestrator (orchestrators/imajin-app/src/imajin_app/main.py) automatically processes generated images unless skip_processing=true.
Duration: 1-5 seconds (depends on resolution and derivative count)
4. Batch Multi-Size Generation Flow
Entry: POST /generate/batch-sizes (imajin orchestrator)
sequenceDiagram
participant Consumer
participant Orchestrator as imajin (main.py)
participant Strategy as BaseImageStrategy
participant VRAMBoss as vram-boss
participant Diffusion as imajin-diffusion
participant Focal as FocalPointDetector
participant Processing as imajin-processing
Consumer->>Orchestrator: POST /generate/batch-sizes
Note over Orchestrator: { sizes: ["hero", "og", "sidebar"] }
Orchestrator-->>Consumer: { job_id: "...", status: "queued" }
Note over Strategy: Analyze sizes, group by aspect
Strategy-->>Orchestrator: Need 2 bases: landscape, portrait
loop For each base needed
Orchestrator->>VRAMBoss: Acquire GPU lease
VRAMBoss-->>Orchestrator: Lease granted
Orchestrator->>Diffusion: Generate base (seed=X, layout=Y)
Diffusion-->>Orchestrator: Base image
VRAMBoss-->>Orchestrator: Lease released
end
loop For each base generated
Orchestrator->>Focal: Detect focal point
Focal-->>Orchestrator: FocalPoint(x, y)
end
loop For each requested size
Orchestrator->>Processing: POST /derivatives/clip-focal
Note over Processing: Crop with focal point preservation
Processing-->>Orchestrator: Cropped derivative
end
Consumer->>Orchestrator: GET /jobs/{job_id}
Orchestrator-->>Consumer: { status: "completed", images: {...} }
Batch Pipeline Stages:
BatchSizesRequest { sizes[], seed?, priority }
↓
Stage 1: AnalyzeSizesStage
→ Determine minimal bases needed (landscape/square/portrait)
→ Generate or use provided seed
↓
Stage 2: GenerateBasesStage
→ Acquire GPU lease via vram-boss
→ Generate each base with consistent seed
→ Same "person" across all bases
↓
Stage 3: DetectFocalPointsStage
→ MediaPipe face detection per base
→ Fallback to center if no face
↓
Stage 4: CropDerivativesStage
→ Crop bases to requested sizes
→ Preserve focal point in crop region
↓
BatchSizesResponse { images, bases_generated, seed }
Key Benefits:
- Visual Coherence: Same seed = same "person" across all sizes
- Efficiency: 4 sizes from 2 bases instead of 4 separate generations
- Smart Cropping: Faces preserved via focal point detection
Duration: 8-15 seconds (vs 20-40s generating each independently)
See Also: Multi-Base Strategy for full implementation details.
Data Formats
Image Data
All image data is transmitted as base64-encoded strings:
interface GenerateResponse {
imageData: string; // base64-encoded PNG/WebP
format: 'png' | 'webp';
width: number;
height: number;
}
Prompt Data
interface ParsedPrompt {
name: string; // Human-readable identifier
prompt: string; // Positive image prompt
negativePrompt: string; // Negative image prompt
}
Error Propagation
Errors bubble up through the service chain:
graph LR
GPU[GPU OOM] --> GEN[imajin-diffusion 500]
GEN --> UI[UI Error State]
LLM[LLM Timeout] --> ASSIST[imajin-prompt 500]
ASSIST --> UI
All services return structured error responses:
{
"error": "GPU out of memory",
"code": "GPU_OOM",
"details": { "requested": "8GB", "available": "4GB" }
}