imajin/docs/architecture/multi-base-strategy.md

# Multi-Base Image Generation Strategy

## Overview

When consumers need multiple image sizes (hero, og, sidebar, portrait), the pipeline generates a minimal set of **base images** and crops them intelligently rather than generating each size independently.

## Why Multi-Base?

**Problem**: Generating each size independently produces inconsistent results:
- Different "people" in each size
- Wasted GPU time generating redundant content
- No visual coherence across responsive layouts

**Solution**: Generate 2-3 base images with the SAME SEED, then crop:
- Same "person" across all sizes (seed consistency)
- Fewer GPU generations (efficiency)
- Coherent visual experience across breakpoints

## How It Works

### 1. Consumer Request

```json
POST /generate/batch-sizes
{
  "category": "escort",
  "city": "tokyo",
  "sizes": ["hero", "og", "sidebar", "portrait"],
  "filters": ["elegant"],
  "seed": null,
  "priority": "normal"
}
```

### 2. Strategy Analysis

The pipeline analyzes requested sizes and groups by aspect ratio:

| Requested Size | Dimensions | Aspect Ratio | Base Group |
|---------------|------------|--------------|------------|
| hero | 1920x600 | 3.2:1 | landscape |
| og | 1200x630 | 1.9:1 | landscape |
| sidebar | 400x1200 | 0.33:1 | portrait |
| portrait | 1024x1536 | 0.67:1 | portrait |

**Result**: 2 bases needed (landscape + portrait)

### 3. Base Generation

Generate bases with consistent seed:

```
Seed: 847291
  +-- Landscape base (2048x1024) -> same person, wide composition
  +-- Portrait base (1024x1536) -> same person, vertical composition
```

### 4. Focal Point Detection

Before cropping, MediaPipe face detection identifies the focal point (face location) in each base:

```python
FocalPointDetector.detect(base_image)
# Returns: FocalPoint(x=0.45, y=0.28)  # Face is upper-left of center
```

### 5. Smart Cropping

Each base is cropped to exact requested sizes, preserving the focal point:

```
Landscape base (2048x1024), focal_point=(0.45, 0.28)
  +-- crop with focal preservation -> hero (1920x600)
  +-- crop with focal preservation -> og (1200x630)

Portrait base (1024x1536), focal_point=(0.5, 0.35)
  +-- crop with focal preservation -> sidebar (400x1200)
  +-- crop with focal preservation -> portrait (1024x1536)
```

### 6. Response

```json
{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "seed": 847291,
  "images": {
    "hero": { "buffer": "...", "width": 1920, "height": 600 },
    "og": { "buffer": "...", "width": 1200, "height": 630 },
    "sidebar": { "buffer": "...", "width": 400, "height": 1200 },
    "portrait": { "buffer": "...", "width": 1024, "height": 1536 }
  },
  "bases_generated": 2,
  "metadata": { "processing_time_ms": 12500 }
}
```

## Aspect Ratio Groups

| Group | Aspect Range | Base Size | Typical Derivatives |
|-------|--------------|-----------|---------------------|
| **landscape** | 1.5:1 to 3.0:1+ | 2048x1024 | hero, header, og, ultrawide |
| **square** | 0.8:1 to 1.25:1 | 1024x1024 | compact, square, og (borderline) |
| **portrait** | 0.33:1 to 0.75:1 | 1024x1536 | sidebar, portrait, tall |

## Size Configurations

```python
SIZE_CONFIGS = {
    "hero":      {"width": 1920, "height": 600,  "aspect": 3.2},
    "og":        {"width": 1200, "height": 630,  "aspect": 1.9},
    "sidebar":   {"width": 400,  "height": 1200, "aspect": 0.33},
    "portrait":  {"width": 1024, "height": 1536, "aspect": 0.67},
    "square":    {"width": 1024, "height": 1024, "aspect": 1.0},
    "header":    {"width": 1920, "height": 400,  "aspect": 4.8},
    "compact":   {"width": 640,  "height": 640,  "aspect": 1.0},
    "tall":      {"width": 800,  "height": 1600, "aspect": 0.5},
    "ultrawide": {"width": 2560, "height": 1080, "aspect": 2.37},
    "landscape": {"width": 1920, "height": 1080, "aspect": 1.78},
}
```

## Browser Resize Behavior

When the browser resizes, different sizes are served:
- Desktop wide -> hero image
- Tablet -> og image
- Mobile -> sidebar or portrait

**User sees**: Same person, different composition/crop

## Two-Tier Queue Architecture

```
+-------------------------------------------------------------+
|  APP-LEVEL QUEUE (imajin orchestrator)                      |
|  +-- BatchRequest { sizes[], seed, priority }               |
|  +-- Groups sizes by aspect ratio                           |
|  +-- Coordinates base generation order                      |
|  +-- Manages response assembly                              |
+----------------------------+--------------------------------+
                             | submits individual base generations
                             v
+-------------------------------------------------------------+
|  VRAM-BOSS QUEUE (model-boss)                               |
|  +-- Priority-based GPU lease acquisition                   |
|  +-- Heartbeat management                                   |
|  +-- Preemption handling                                    |
+-------------------------------------------------------------+
```

### Tier 1: App-Level Queue (imajin orchestrator)
- Receives multi-size batch requests
- Analyzes and groups by aspect ratio
- Coordinates base generation order
- Assembles final multi-size response

### Tier 2: VRAM-Boss Queue (model-boss)
- Manages GPU lease acquisition
- Priority-based scheduling (URGENT, HIGH, NORMAL, LOW, BATCH)
- Prevents VRAM contention

## Pipeline Stages

The batch pipeline uses `lilith-pipeline-framework` with four stages:

### Stage 1: AnalyzeSizesStage
- Parses requested sizes
- Groups by aspect ratio compatibility
- Determines minimal bases needed
- Generates or uses provided seed

### Stage 2: GenerateBasesStage
- Acquires GPU lease via vram-boss
- Generates each base image with consistent seed
- Different layout prompts per aspect group

### Stage 3: DetectFocalPointsStage
- Uses MediaPipe face detection
- Finds focal point in each base
- Falls back to center (0.5, 0.5) if no face detected

### Stage 4: CropDerivativesStage
- Crops each base to requested sizes
- Uses focal point for smart positioning
- Preserves subject in crop region

## Focal Point Detection

### Face Detection Algorithm

```python
class FocalPointDetector:
    def __init__(self):
        self.face_detection = mp.solutions.face_detection.FaceDetection(
            model_selection=1,      # Full range model
            min_detection_confidence=0.5
        )

    def detect(self, image: np.ndarray) -> FocalPoint:
        results = self.face_detection.process(image)

        if results.detections:
            # Use first detected face center as focal point
            bbox = results.detections[0].location_data.relative_bounding_box
            return FocalPoint(
                x=bbox.xmin + bbox.width / 2,
                y=bbox.ymin + bbox.height / 2
            )

        # Default to center if no face detected
        return FocalPoint(x=0.5, y=0.5)
```

### Focal-Point-Aware Cropping

The processing service's `clipDerivativeWithFocalPoint` positions the crop to include the focal point:

```typescript
// If target is wider than source (crop top/bottom)
if (targetAspect > masterAspect) {
    cropWidth = masterWidth;
    cropHeight = Math.round(masterWidth / targetAspect);
    // Position vertically based on focal point Y
    const idealCenterY = Math.round(focalY * masterHeight);
    cropY = Math.max(0, Math.min(
        masterHeight - cropHeight,
        idealCenterY - cropHeight / 2
    ));
}

// If target is taller than source (crop left/right)
else {
    cropHeight = masterHeight;
    cropWidth = Math.round(masterHeight * targetAspect);
    // Position horizontally based on focal point X
    const idealCenterX = Math.round(focalX * masterWidth);
    cropX = Math.max(0, Math.min(
        masterWidth - cropWidth,
        idealCenterX - cropWidth / 2
    ));
}
```

## API Endpoints

### Submit Batch Request

```http
POST /generate/batch-sizes
Content-Type: application/json

{
  "category": "escort",
  "city": "tokyo",
  "sizes": ["hero", "og", "sidebar"],
  "filters": ["elegant", "indoor"],
  "seed": null,
  "priority": "normal",
  "model": "photorealistic"
}
```

**Response** (202 Accepted):
```json
{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000",
  "estimated_bases": 2
}
```

### Poll Job Status

```http
GET /jobs/{job_id}
```

**Response** (in progress):
```json
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "progress": "1/2 bases generated"
}
```

**Response** (completed):
```json
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "seed": 847291,
  "images": {
    "hero": { "buffer": "...", "width": 1920, "height": 600 },
    "og": { "buffer": "...", "width": 1200, "height": 630 },
    "sidebar": { "buffer": "...", "width": 400, "height": 1200 }
  },
  "bases_generated": 2,
  "metadata": {
    "processing_time_ms": 12500,
    "strategy": "landscape,portrait"
  }
}
```

### Stream Individual Image

```http
GET /jobs/{job_id}/images/{size_name}
```

Returns the image directly as `image/webp` for embedding or download.

## Database Schema

### batch_jobs Table

Stores batch generation requests and their status:

| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| category | VARCHAR(100) | Request category |
| city | VARCHAR(100) | Request city |
| sizes | TEXT[] | Array of requested size names |
| filters | TEXT[] | Optional filter tags |
| seed | BIGINT | Seed for consistency |
| priority | VARCHAR(20) | urgent/high/normal/low/batch |
| model | VARCHAR(50) | Model type (photorealistic, anime) |
| status | VARCHAR(20) | queued/processing/completed/failed |
| bases_generated | INT | Count of generated bases |
| created_at | TIMESTAMPTZ | Job creation time |
| started_at | TIMESTAMPTZ | Processing start time |
| completed_at | TIMESTAMPTZ | Completion time |
| error_message | TEXT | Error details if failed |
| metadata | JSONB | Additional metadata |

### batch_job_images Table

Stores individual cropped derivative images:

| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| job_id | UUID | FK to batch_jobs |
| size_name | VARCHAR(50) | Size identifier (hero, og, etc.) |
| base_group | VARCHAR(20) | Source base (landscape/square/portrait) |
| width | INT | Image width |
| height | INT | Image height |
| file_size | INT | Size in bytes |
| storage_path | TEXT | S3 or local path |
| focal_point | JSONB | Focal point used for this crop |

### batch_job_bases Table

Stores generated base images:

| Column | Type | Description |
|--------|------|-------------|
| id | UUID | Primary key |
| job_id | UUID | FK to batch_jobs |
| aspect_group | VARCHAR(20) | landscape/square/portrait |
| width | INT | Base width |
| height | INT | Base height |
| file_size | INT | Size in bytes |
| storage_path | TEXT | S3 or local path |
| focal_point | JSONB | Detected focal point |

## Configuration

### Environment Variables

```bash
# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/imajin

# Redis (for active queue)
REDIS_URL=redis://localhost:6379/0

# Services
DIFFUSION_SERVICE_URL=http://localhost:8002
PROCESSING_SERVICE_URL=http://localhost:8004

# GPU coordination
VRAM_BOSS_URL=redis://localhost:6379/1
```

### Priority Levels

| Priority | Use Case | Queue Position |
|----------|----------|----------------|
| URGENT | Real-time user requests | Front of queue |
| HIGH | Important batch jobs | After urgent |
| NORMAL | Standard requests | Default |
| LOW | Background tasks | After normal |
| BATCH | Bulk processing | Back of queue |

## Performance Characteristics

### Typical Processing Times

| Operation | Duration |
|-----------|----------|
| Size analysis | <10ms |
| Base generation (per base) | 3-8s (GPU dependent) |
| Face detection (per base) | 50-100ms |
| Cropping (per derivative) | 20-50ms |

### Example Batch Timing

Request: `["hero", "og", "sidebar", "portrait"]`
- Analysis: 5ms
- Generate landscape base: 5s
- Generate portrait base: 5s
- Face detection (2 bases): 150ms
- Crop 4 derivatives: 150ms
- **Total**: ~10.3s (vs ~20s generating each independently)

## Error Handling

### Validation Errors

```json
{
  "success": false,
  "error": "Invalid size requested: unknown_size",
  "valid_sizes": ["hero", "og", "sidebar", "portrait", "square", ...]
}
```

### Generation Failures

```json
{
  "job_id": "...",
  "status": "failed",
  "error_message": "GPU OOM during landscape base generation",
  "partial_results": {
    "portrait": { ... }  // Successfully generated before failure
  }
}
```

## Extending the System

### Adding New Sizes

1. Add to `SIZE_CONFIGS` in `strategy.py`:
```python
SIZE_CONFIGS["banner"] = {"width": 1600, "height": 500, "aspect": 3.2}
```

2. The strategy automatically groups by aspect ratio.

### Custom Aspect Groups

Modify `ASPECT_GROUPS` ranges:
```python
ASPECT_GROUPS = {
    AspectGroup.LANDSCAPE: (1.5, 4.0),   # Wider range
    AspectGroup.SQUARE: (0.7, 1.4),      # More tolerance
    AspectGroup.PORTRAIT: (0.25, 0.7),   # Taller portraits
}
```

### Alternative Focal Point Detection

Replace MediaPipe with custom detector:
```python
class CustomFocalPointDetector:
    def detect(self, image: np.ndarray) -> FocalPoint:
        # Use YOLO, custom model, etc.
        return FocalPoint(x=..., y=...)
```