- Remove old imajin/ directory (migrated to services/ + orchestrators/) - Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md) - Remove standalone test generation scripts - Update docs to reflect current architecture - Add multi-base-strategy.md documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
13 KiB
Multi-Base Image Generation Strategy
Overview
When consumers need multiple image sizes (hero, og, sidebar, portrait), the pipeline generates a minimal set of base images and crops them intelligently rather than generating each size independently.
Why Multi-Base?
Problem: Generating each size independently produces inconsistent results:
- Different "people" in each size
- Wasted GPU time generating redundant content
- No visual coherence across responsive layouts
Solution: Generate 2-3 base images with the SAME SEED, then crop:
- Same "person" across all sizes (seed consistency)
- Fewer GPU generations (efficiency)
- Coherent visual experience across breakpoints
How It Works
1. Consumer Request
POST /generate/batch-sizes
{
"category": "escort",
"city": "tokyo",
"sizes": ["hero", "og", "sidebar", "portrait"],
"filters": ["elegant"],
"seed": null,
"priority": "normal"
}
2. Strategy Analysis
The pipeline analyzes requested sizes and groups by aspect ratio:
| Requested Size | Dimensions | Aspect Ratio | Base Group |
|---|---|---|---|
| hero | 1920x600 | 3.2:1 | landscape |
| og | 1200x630 | 1.9:1 | landscape |
| sidebar | 400x1200 | 0.33:1 | portrait |
| portrait | 1024x1536 | 0.67:1 | portrait |
Result: 2 bases needed (landscape + portrait)
3. Base Generation
Generate bases with consistent seed:
Seed: 847291
+-- Landscape base (2048x1024) -> same person, wide composition
+-- Portrait base (1024x1536) -> same person, vertical composition
4. Focal Point Detection
Before cropping, MediaPipe face detection identifies the focal point (face location) in each base:
FocalPointDetector.detect(base_image)
# Returns: FocalPoint(x=0.45, y=0.28) # Face is upper-left of center
5. Smart Cropping
Each base is cropped to exact requested sizes, preserving the focal point:
Landscape base (2048x1024), focal_point=(0.45, 0.28)
+-- crop with focal preservation -> hero (1920x600)
+-- crop with focal preservation -> og (1200x630)
Portrait base (1024x1536), focal_point=(0.5, 0.35)
+-- crop with focal preservation -> sidebar (400x1200)
+-- crop with focal preservation -> portrait (1024x1536)
6. Response
{
"success": true,
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"seed": 847291,
"images": {
"hero": { "buffer": "...", "width": 1920, "height": 600 },
"og": { "buffer": "...", "width": 1200, "height": 630 },
"sidebar": { "buffer": "...", "width": 400, "height": 1200 },
"portrait": { "buffer": "...", "width": 1024, "height": 1536 }
},
"bases_generated": 2,
"metadata": { "processing_time_ms": 12500 }
}
Aspect Ratio Groups
| Group | Aspect Range | Base Size | Typical Derivatives |
|---|---|---|---|
| landscape | 1.5:1 to 3.0:1+ | 2048x1024 | hero, header, og, ultrawide |
| square | 0.8:1 to 1.25:1 | 1024x1024 | compact, square, og (borderline) |
| portrait | 0.33:1 to 0.75:1 | 1024x1536 | sidebar, portrait, tall |
Size Configurations
SIZE_CONFIGS = {
"hero": {"width": 1920, "height": 600, "aspect": 3.2},
"og": {"width": 1200, "height": 630, "aspect": 1.9},
"sidebar": {"width": 400, "height": 1200, "aspect": 0.33},
"portrait": {"width": 1024, "height": 1536, "aspect": 0.67},
"square": {"width": 1024, "height": 1024, "aspect": 1.0},
"header": {"width": 1920, "height": 400, "aspect": 4.8},
"compact": {"width": 640, "height": 640, "aspect": 1.0},
"tall": {"width": 800, "height": 1600, "aspect": 0.5},
"ultrawide": {"width": 2560, "height": 1080, "aspect": 2.37},
"landscape": {"width": 1920, "height": 1080, "aspect": 1.78},
}
Browser Resize Behavior
When the browser resizes, different sizes are served:
- Desktop wide -> hero image
- Tablet -> og image
- Mobile -> sidebar or portrait
User sees: Same person, different composition/crop
Two-Tier Queue Architecture
+-------------------------------------------------------------+
| APP-LEVEL QUEUE (imajin orchestrator) |
| +-- BatchRequest { sizes[], seed, priority } |
| +-- Groups sizes by aspect ratio |
| +-- Coordinates base generation order |
| +-- Manages response assembly |
+----------------------------+--------------------------------+
| submits individual base generations
v
+-------------------------------------------------------------+
| VRAM-BOSS QUEUE (model-boss) |
| +-- Priority-based GPU lease acquisition |
| +-- Heartbeat management |
| +-- Preemption handling |
+-------------------------------------------------------------+
Tier 1: App-Level Queue (imajin orchestrator)
- Receives multi-size batch requests
- Analyzes and groups by aspect ratio
- Coordinates base generation order
- Assembles final multi-size response
Tier 2: VRAM-Boss Queue (model-boss)
- Manages GPU lease acquisition
- Priority-based scheduling (URGENT, HIGH, NORMAL, LOW, BATCH)
- Prevents VRAM contention
Pipeline Stages
The batch pipeline uses lilith-pipeline-framework with four stages:
Stage 1: AnalyzeSizesStage
- Parses requested sizes
- Groups by aspect ratio compatibility
- Determines minimal bases needed
- Generates or uses provided seed
Stage 2: GenerateBasesStage
- Acquires GPU lease via vram-boss
- Generates each base image with consistent seed
- Different layout prompts per aspect group
Stage 3: DetectFocalPointsStage
- Uses MediaPipe face detection
- Finds focal point in each base
- Falls back to center (0.5, 0.5) if no face detected
Stage 4: CropDerivativesStage
- Crops each base to requested sizes
- Uses focal point for smart positioning
- Preserves subject in crop region
Focal Point Detection
Face Detection Algorithm
class FocalPointDetector:
def __init__(self):
self.face_detection = mp.solutions.face_detection.FaceDetection(
model_selection=1, # Full range model
min_detection_confidence=0.5
)
def detect(self, image: np.ndarray) -> FocalPoint:
results = self.face_detection.process(image)
if results.detections:
# Use first detected face center as focal point
bbox = results.detections[0].location_data.relative_bounding_box
return FocalPoint(
x=bbox.xmin + bbox.width / 2,
y=bbox.ymin + bbox.height / 2
)
# Default to center if no face detected
return FocalPoint(x=0.5, y=0.5)
Focal-Point-Aware Cropping
The processing service's clipDerivativeWithFocalPoint positions the crop to include the focal point:
// If target is wider than source (crop top/bottom)
if (targetAspect > masterAspect) {
cropWidth = masterWidth;
cropHeight = Math.round(masterWidth / targetAspect);
// Position vertically based on focal point Y
const idealCenterY = Math.round(focalY * masterHeight);
cropY = Math.max(0, Math.min(
masterHeight - cropHeight,
idealCenterY - cropHeight / 2
));
}
// If target is taller than source (crop left/right)
else {
cropHeight = masterHeight;
cropWidth = Math.round(masterHeight * targetAspect);
// Position horizontally based on focal point X
const idealCenterX = Math.round(focalX * masterWidth);
cropX = Math.max(0, Math.min(
masterWidth - cropWidth,
idealCenterX - cropWidth / 2
));
}
API Endpoints
Submit Batch Request
POST /generate/batch-sizes
Content-Type: application/json
{
"category": "escort",
"city": "tokyo",
"sizes": ["hero", "og", "sidebar"],
"filters": ["elegant", "indoor"],
"seed": null,
"priority": "normal",
"model": "photorealistic"
}
Response (202 Accepted):
{
"success": true,
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000",
"estimated_bases": 2
}
Poll Job Status
GET /jobs/{job_id}
Response (in progress):
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"progress": "1/2 bases generated"
}
Response (completed):
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"seed": 847291,
"images": {
"hero": { "buffer": "...", "width": 1920, "height": 600 },
"og": { "buffer": "...", "width": 1200, "height": 630 },
"sidebar": { "buffer": "...", "width": 400, "height": 1200 }
},
"bases_generated": 2,
"metadata": {
"processing_time_ms": 12500,
"strategy": "landscape,portrait"
}
}
Stream Individual Image
GET /jobs/{job_id}/images/{size_name}
Returns the image directly as image/webp for embedding or download.
Database Schema
batch_jobs Table
Stores batch generation requests and their status:
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| category | VARCHAR(100) | Request category |
| city | VARCHAR(100) | Request city |
| sizes | TEXT[] | Array of requested size names |
| filters | TEXT[] | Optional filter tags |
| seed | BIGINT | Seed for consistency |
| priority | VARCHAR(20) | urgent/high/normal/low/batch |
| model | VARCHAR(50) | Model type (photorealistic, anime) |
| status | VARCHAR(20) | queued/processing/completed/failed |
| bases_generated | INT | Count of generated bases |
| created_at | TIMESTAMPTZ | Job creation time |
| started_at | TIMESTAMPTZ | Processing start time |
| completed_at | TIMESTAMPTZ | Completion time |
| error_message | TEXT | Error details if failed |
| metadata | JSONB | Additional metadata |
batch_job_images Table
Stores individual cropped derivative images:
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| job_id | UUID | FK to batch_jobs |
| size_name | VARCHAR(50) | Size identifier (hero, og, etc.) |
| base_group | VARCHAR(20) | Source base (landscape/square/portrait) |
| width | INT | Image width |
| height | INT | Image height |
| file_size | INT | Size in bytes |
| storage_path | TEXT | S3 or local path |
| focal_point | JSONB | Focal point used for this crop |
batch_job_bases Table
Stores generated base images:
| Column | Type | Description |
|---|---|---|
| id | UUID | Primary key |
| job_id | UUID | FK to batch_jobs |
| aspect_group | VARCHAR(20) | landscape/square/portrait |
| width | INT | Base width |
| height | INT | Base height |
| file_size | INT | Size in bytes |
| storage_path | TEXT | S3 or local path |
| focal_point | JSONB | Detected focal point |
Configuration
Environment Variables
# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/imajin
# Redis (for active queue)
REDIS_URL=redis://localhost:6379/0
# Services
DIFFUSION_SERVICE_URL=http://localhost:8002
PROCESSING_SERVICE_URL=http://localhost:8004
# GPU coordination
VRAM_BOSS_URL=redis://localhost:6379/1
Priority Levels
| Priority | Use Case | Queue Position |
|---|---|---|
| URGENT | Real-time user requests | Front of queue |
| HIGH | Important batch jobs | After urgent |
| NORMAL | Standard requests | Default |
| LOW | Background tasks | After normal |
| BATCH | Bulk processing | Back of queue |
Performance Characteristics
Typical Processing Times
| Operation | Duration |
|---|---|
| Size analysis | <10ms |
| Base generation (per base) | 3-8s (GPU dependent) |
| Face detection (per base) | 50-100ms |
| Cropping (per derivative) | 20-50ms |
Example Batch Timing
Request: ["hero", "og", "sidebar", "portrait"]
- Analysis: 5ms
- Generate landscape base: 5s
- Generate portrait base: 5s
- Face detection (2 bases): 150ms
- Crop 4 derivatives: 150ms
- Total: ~10.3s (vs ~20s generating each independently)
Error Handling
Validation Errors
{
"success": false,
"error": "Invalid size requested: unknown_size",
"valid_sizes": ["hero", "og", "sidebar", "portrait", "square", ...]
}
Generation Failures
{
"job_id": "...",
"status": "failed",
"error_message": "GPU OOM during landscape base generation",
"partial_results": {
"portrait": { ... } // Successfully generated before failure
}
}
Extending the System
Adding New Sizes
- Add to
SIZE_CONFIGSinstrategy.py:
SIZE_CONFIGS["banner"] = {"width": 1600, "height": 500, "aspect": 3.2}
- The strategy automatically groups by aspect ratio.
Custom Aspect Groups
Modify ASPECT_GROUPS ranges:
ASPECT_GROUPS = {
AspectGroup.LANDSCAPE: (1.5, 4.0), # Wider range
AspectGroup.SQUARE: (0.7, 1.4), # More tolerance
AspectGroup.PORTRAIT: (0.25, 0.7), # Taller portraits
}
Alternative Focal Point Detection
Replace MediaPipe with custom detector:
class CustomFocalPointDetector:
def detect(self, image: np.ndarray) -> FocalPoint:
# Use YOLO, custom model, etc.
return FocalPoint(x=..., y=...)