imajin/docs/architecture/multi-base-strategy.md
Lilith a5f99bb3d7 chore(imajin): clean up legacy structure and completion markers
- Remove old imajin/ directory (migrated to services/ + orchestrators/)
- Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md)
- Remove standalone test generation scripts
- Update docs to reflect current architecture
- Add multi-base-strategy.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 17:01:10 -08:00

479 lines
13 KiB
Markdown

# Multi-Base Image Generation Strategy
## Overview
When consumers need multiple image sizes (hero, og, sidebar, portrait), the pipeline generates a minimal set of **base images** and crops them intelligently rather than generating each size independently.
## Why Multi-Base?
**Problem**: Generating each size independently produces inconsistent results:
- Different "people" in each size
- Wasted GPU time generating redundant content
- No visual coherence across responsive layouts
**Solution**: Generate 2-3 base images with the SAME SEED, then crop:
- Same "person" across all sizes (seed consistency)
- Fewer GPU generations (efficiency)
- Coherent visual experience across breakpoints
## How It Works
### 1. Consumer Request
```json
POST /generate/batch-sizes
{
"category": "escort",
"city": "tokyo",
"sizes": ["hero", "og", "sidebar", "portrait"],
"filters": ["elegant"],
"seed": null,
"priority": "normal"
}
```
### 2. Strategy Analysis
The pipeline analyzes requested sizes and groups by aspect ratio:
| Requested Size | Dimensions | Aspect Ratio | Base Group |
|---------------|------------|--------------|------------|
| hero | 1920x600 | 3.2:1 | landscape |
| og | 1200x630 | 1.9:1 | landscape |
| sidebar | 400x1200 | 0.33:1 | portrait |
| portrait | 1024x1536 | 0.67:1 | portrait |
**Result**: 2 bases needed (landscape + portrait)
### 3. Base Generation
Generate bases with consistent seed:
```
Seed: 847291
+-- Landscape base (2048x1024) -> same person, wide composition
+-- Portrait base (1024x1536) -> same person, vertical composition
```
### 4. Focal Point Detection
Before cropping, MediaPipe face detection identifies the focal point (face location) in each base:
```python
FocalPointDetector.detect(base_image)
# Returns: FocalPoint(x=0.45, y=0.28) # Face is upper-left of center
```
### 5. Smart Cropping
Each base is cropped to exact requested sizes, preserving the focal point:
```
Landscape base (2048x1024), focal_point=(0.45, 0.28)
+-- crop with focal preservation -> hero (1920x600)
+-- crop with focal preservation -> og (1200x630)
Portrait base (1024x1536), focal_point=(0.5, 0.35)
+-- crop with focal preservation -> sidebar (400x1200)
+-- crop with focal preservation -> portrait (1024x1536)
```
### 6. Response
```json
{
"success": true,
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"seed": 847291,
"images": {
"hero": { "buffer": "...", "width": 1920, "height": 600 },
"og": { "buffer": "...", "width": 1200, "height": 630 },
"sidebar": { "buffer": "...", "width": 400, "height": 1200 },
"portrait": { "buffer": "...", "width": 1024, "height": 1536 }
},
"bases_generated": 2,
"metadata": { "processing_time_ms": 12500 }
}
```
## Aspect Ratio Groups
| Group | Aspect Range | Base Size | Typical Derivatives |
|-------|--------------|-----------|---------------------|
| **landscape** | 1.5:1 to 3.0:1+ | 2048x1024 | hero, header, og, ultrawide |
| **square** | 0.8:1 to 1.25:1 | 1024x1024 | compact, square, og (borderline) |
| **portrait** | 0.33:1 to 0.75:1 | 1024x1536 | sidebar, portrait, tall |
## Size Configurations
```python
SIZE_CONFIGS = {
"hero": {"width": 1920, "height": 600, "aspect": 3.2},
"og": {"width": 1200, "height": 630, "aspect": 1.9},
"sidebar": {"width": 400, "height": 1200, "aspect": 0.33},
"portrait": {"width": 1024, "height": 1536, "aspect": 0.67},
"square": {"width": 1024, "height": 1024, "aspect": 1.0},
"header": {"width": 1920, "height": 400, "aspect": 4.8},
"compact": {"width": 640, "height": 640, "aspect": 1.0},
"tall": {"width": 800, "height": 1600, "aspect": 0.5},
"ultrawide": {"width": 2560, "height": 1080, "aspect": 2.37},
"landscape": {"width": 1920, "height": 1080, "aspect": 1.78},
}
```
## Browser Resize Behavior
When the browser resizes, different sizes are served:
- Desktop wide -> hero image
- Tablet -> og image
- Mobile -> sidebar or portrait
**User sees**: Same person, different composition/crop
## Two-Tier Queue Architecture
```
+-------------------------------------------------------------+
| APP-LEVEL QUEUE (imajin orchestrator) |
| +-- BatchRequest { sizes[], seed, priority } |
| +-- Groups sizes by aspect ratio |
| +-- Coordinates base generation order |
| +-- Manages response assembly |
+----------------------------+--------------------------------+
| submits individual base generations
v
+-------------------------------------------------------------+
| VRAM-BOSS QUEUE (model-boss) |
| +-- Priority-based GPU lease acquisition |
| +-- Heartbeat management |
| +-- Preemption handling |
+-------------------------------------------------------------+
```
### Tier 1: App-Level Queue (imajin orchestrator)
- Receives multi-size batch requests
- Analyzes and groups by aspect ratio
- Coordinates base generation order
- Assembles final multi-size response
### Tier 2: VRAM-Boss Queue (model-boss)
- Manages GPU lease acquisition
- Priority-based scheduling (URGENT, HIGH, NORMAL, LOW, BATCH)
- Prevents VRAM contention
## Pipeline Stages
The batch pipeline uses `lilith-pipeline-framework` with four stages:
### Stage 1: AnalyzeSizesStage
- Parses requested sizes
- Groups by aspect ratio compatibility
- Determines minimal bases needed
- Generates or uses provided seed
### Stage 2: GenerateBasesStage
- Acquires GPU lease via vram-boss
- Generates each base image with consistent seed
- Different layout prompts per aspect group
### Stage 3: DetectFocalPointsStage
- Uses MediaPipe face detection
- Finds focal point in each base
- Falls back to center (0.5, 0.5) if no face detected
### Stage 4: CropDerivativesStage
- Crops each base to requested sizes
- Uses focal point for smart positioning
- Preserves subject in crop region
## Focal Point Detection
### Face Detection Algorithm
```python
class FocalPointDetector:
def __init__(self):
self.face_detection = mp.solutions.face_detection.FaceDetection(
model_selection=1, # Full range model
min_detection_confidence=0.5
)
def detect(self, image: np.ndarray) -> FocalPoint:
results = self.face_detection.process(image)
if results.detections:
# Use first detected face center as focal point
bbox = results.detections[0].location_data.relative_bounding_box
return FocalPoint(
x=bbox.xmin + bbox.width / 2,
y=bbox.ymin + bbox.height / 2
)
# Default to center if no face detected
return FocalPoint(x=0.5, y=0.5)
```
### Focal-Point-Aware Cropping
The processing service's `clipDerivativeWithFocalPoint` positions the crop to include the focal point:
```typescript
// If target is wider than source (crop top/bottom)
if (targetAspect > masterAspect) {
cropWidth = masterWidth;
cropHeight = Math.round(masterWidth / targetAspect);
// Position vertically based on focal point Y
const idealCenterY = Math.round(focalY * masterHeight);
cropY = Math.max(0, Math.min(
masterHeight - cropHeight,
idealCenterY - cropHeight / 2
));
}
// If target is taller than source (crop left/right)
else {
cropHeight = masterHeight;
cropWidth = Math.round(masterHeight * targetAspect);
// Position horizontally based on focal point X
const idealCenterX = Math.round(focalX * masterWidth);
cropX = Math.max(0, Math.min(
masterWidth - cropWidth,
idealCenterX - cropWidth / 2
));
}
```
## API Endpoints
### Submit Batch Request
```http
POST /generate/batch-sizes
Content-Type: application/json
{
"category": "escort",
"city": "tokyo",
"sizes": ["hero", "og", "sidebar"],
"filters": ["elegant", "indoor"],
"seed": null,
"priority": "normal",
"model": "photorealistic"
}
```
**Response** (202 Accepted):
```json
{
"success": true,
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000",
"estimated_bases": 2
}
```
### Poll Job Status
```http
GET /jobs/{job_id}
```
**Response** (in progress):
```json
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "processing",
"progress": "1/2 bases generated"
}
```
**Response** (completed):
```json
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"seed": 847291,
"images": {
"hero": { "buffer": "...", "width": 1920, "height": 600 },
"og": { "buffer": "...", "width": 1200, "height": 630 },
"sidebar": { "buffer": "...", "width": 400, "height": 1200 }
},
"bases_generated": 2,
"metadata": {
"processing_time_ms": 12500,
"strategy": "landscape,portrait"
}
}
```
### Stream Individual Image
```http
GET /jobs/{job_id}/images/{size_name}
```
Returns the image directly as `image/webp` for embedding or download.
## Database Schema
### batch_jobs Table
Stores batch generation requests and their status:
| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| category | VARCHAR(100) | Request category |
| city | VARCHAR(100) | Request city |
| sizes | TEXT[] | Array of requested size names |
| filters | TEXT[] | Optional filter tags |
| seed | BIGINT | Seed for consistency |
| priority | VARCHAR(20) | urgent/high/normal/low/batch |
| model | VARCHAR(50) | Model type (photorealistic, anime) |
| status | VARCHAR(20) | queued/processing/completed/failed |
| bases_generated | INT | Count of generated bases |
| created_at | TIMESTAMPTZ | Job creation time |
| started_at | TIMESTAMPTZ | Processing start time |
| completed_at | TIMESTAMPTZ | Completion time |
| error_message | TEXT | Error details if failed |
| metadata | JSONB | Additional metadata |
### batch_job_images Table
Stores individual cropped derivative images:
| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| job_id | UUID | FK to batch_jobs |
| size_name | VARCHAR(50) | Size identifier (hero, og, etc.) |
| base_group | VARCHAR(20) | Source base (landscape/square/portrait) |
| width | INT | Image width |
| height | INT | Image height |
| file_size | INT | Size in bytes |
| storage_path | TEXT | S3 or local path |
| focal_point | JSONB | Focal point used for this crop |
### batch_job_bases Table
Stores generated base images:
| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| job_id | UUID | FK to batch_jobs |
| aspect_group | VARCHAR(20) | landscape/square/portrait |
| width | INT | Base width |
| height | INT | Base height |
| file_size | INT | Size in bytes |
| storage_path | TEXT | S3 or local path |
| focal_point | JSONB | Detected focal point |
## Configuration
### Environment Variables
```bash
# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/imajin
# Redis (for active queue)
REDIS_URL=redis://localhost:6379/0
# Services
DIFFUSION_SERVICE_URL=http://localhost:8002
PROCESSING_SERVICE_URL=http://localhost:8004
# GPU coordination
VRAM_BOSS_URL=redis://localhost:6379/1
```
### Priority Levels
| Priority | Use Case | Queue Position |
|----------|----------|----------------|
| URGENT | Real-time user requests | Front of queue |
| HIGH | Important batch jobs | After urgent |
| NORMAL | Standard requests | Default |
| LOW | Background tasks | After normal |
| BATCH | Bulk processing | Back of queue |
## Performance Characteristics
### Typical Processing Times
| Operation | Duration |
|-----------|----------|
| Size analysis | <10ms |
| Base generation (per base) | 3-8s (GPU dependent) |
| Face detection (per base) | 50-100ms |
| Cropping (per derivative) | 20-50ms |
### Example Batch Timing
Request: `["hero", "og", "sidebar", "portrait"]`
- Analysis: 5ms
- Generate landscape base: 5s
- Generate portrait base: 5s
- Face detection (2 bases): 150ms
- Crop 4 derivatives: 150ms
- **Total**: ~10.3s (vs ~20s generating each independently)
## Error Handling
### Validation Errors
```json
{
"success": false,
"error": "Invalid size requested: unknown_size",
"valid_sizes": ["hero", "og", "sidebar", "portrait", "square", ...]
}
```
### Generation Failures
```json
{
"job_id": "...",
"status": "failed",
"error_message": "GPU OOM during landscape base generation",
"partial_results": {
"portrait": { ... } // Successfully generated before failure
}
}
```
## Extending the System
### Adding New Sizes
1. Add to `SIZE_CONFIGS` in `strategy.py`:
```python
SIZE_CONFIGS["banner"] = {"width": 1600, "height": 500, "aspect": 3.2}
```
2. The strategy automatically groups by aspect ratio.
### Custom Aspect Groups
Modify `ASPECT_GROUPS` ranges:
```python
ASPECT_GROUPS = {
AspectGroup.LANDSCAPE: (1.5, 4.0), # Wider range
AspectGroup.SQUARE: (0.7, 1.4), # More tolerance
AspectGroup.PORTRAIT: (0.25, 0.7), # Taller portraits
}
```
### Alternative Focal Point Detection
Replace MediaPipe with custom detector:
```python
class CustomFocalPointDetector:
def detect(self, image: np.ndarray) -> FocalPoint:
# Use YOLO, custom model, etc.
return FocalPoint(x=..., y=...)
```