imajin/docs/architecture/multi-base-strategy.md
Lilith a5f99bb3d7 chore(imajin): clean up legacy structure and completion markers
- Remove old imajin/ directory (migrated to services/ + orchestrators/)
- Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md)
- Remove standalone test generation scripts
- Update docs to reflect current architecture
- Add multi-base-strategy.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 17:01:10 -08:00

13 KiB

Multi-Base Image Generation Strategy

Overview

When consumers need multiple image sizes (hero, og, sidebar, portrait), the pipeline generates a minimal set of base images and crops them intelligently rather than generating each size independently.

Why Multi-Base?

Problem: Generating each size independently produces inconsistent results:

  • Different "people" in each size
  • Wasted GPU time generating redundant content
  • No visual coherence across responsive layouts

Solution: Generate 2-3 base images with the SAME SEED, then crop:

  • Same "person" across all sizes (seed consistency)
  • Fewer GPU generations (efficiency)
  • Coherent visual experience across breakpoints

How It Works

1. Consumer Request

POST /generate/batch-sizes
{
  "category": "escort",
  "city": "tokyo",
  "sizes": ["hero", "og", "sidebar", "portrait"],
  "filters": ["elegant"],
  "seed": null,
  "priority": "normal"
}

2. Strategy Analysis

The pipeline analyzes requested sizes and groups by aspect ratio:

Requested Size Dimensions Aspect Ratio Base Group
hero 1920x600 3.2:1 landscape
og 1200x630 1.9:1 landscape
sidebar 400x1200 0.33:1 portrait
portrait 1024x1536 0.67:1 portrait

Result: 2 bases needed (landscape + portrait)

3. Base Generation

Generate bases with consistent seed:

Seed: 847291
  +-- Landscape base (2048x1024) -> same person, wide composition
  +-- Portrait base (1024x1536) -> same person, vertical composition

4. Focal Point Detection

Before cropping, MediaPipe face detection identifies the focal point (face location) in each base:

FocalPointDetector.detect(base_image)
# Returns: FocalPoint(x=0.45, y=0.28)  # Face is upper-left of center

5. Smart Cropping

Each base is cropped to exact requested sizes, preserving the focal point:

Landscape base (2048x1024), focal_point=(0.45, 0.28)
  +-- crop with focal preservation -> hero (1920x600)
  +-- crop with focal preservation -> og (1200x630)

Portrait base (1024x1536), focal_point=(0.5, 0.35)
  +-- crop with focal preservation -> sidebar (400x1200)
  +-- crop with focal preservation -> portrait (1024x1536)

6. Response

{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "seed": 847291,
  "images": {
    "hero": { "buffer": "...", "width": 1920, "height": 600 },
    "og": { "buffer": "...", "width": 1200, "height": 630 },
    "sidebar": { "buffer": "...", "width": 400, "height": 1200 },
    "portrait": { "buffer": "...", "width": 1024, "height": 1536 }
  },
  "bases_generated": 2,
  "metadata": { "processing_time_ms": 12500 }
}

Aspect Ratio Groups

Group Aspect Range Base Size Typical Derivatives
landscape 1.5:1 to 3.0:1+ 2048x1024 hero, header, og, ultrawide
square 0.8:1 to 1.25:1 1024x1024 compact, square, og (borderline)
portrait 0.33:1 to 0.75:1 1024x1536 sidebar, portrait, tall

Size Configurations

SIZE_CONFIGS = {
    "hero":      {"width": 1920, "height": 600,  "aspect": 3.2},
    "og":        {"width": 1200, "height": 630,  "aspect": 1.9},
    "sidebar":   {"width": 400,  "height": 1200, "aspect": 0.33},
    "portrait":  {"width": 1024, "height": 1536, "aspect": 0.67},
    "square":    {"width": 1024, "height": 1024, "aspect": 1.0},
    "header":    {"width": 1920, "height": 400,  "aspect": 4.8},
    "compact":   {"width": 640,  "height": 640,  "aspect": 1.0},
    "tall":      {"width": 800,  "height": 1600, "aspect": 0.5},
    "ultrawide": {"width": 2560, "height": 1080, "aspect": 2.37},
    "landscape": {"width": 1920, "height": 1080, "aspect": 1.78},
}

Browser Resize Behavior

When the browser resizes, different sizes are served:

  • Desktop wide -> hero image
  • Tablet -> og image
  • Mobile -> sidebar or portrait

User sees: Same person, different composition/crop

Two-Tier Queue Architecture

+-------------------------------------------------------------+
|  APP-LEVEL QUEUE (imajin orchestrator)                      |
|  +-- BatchRequest { sizes[], seed, priority }               |
|  +-- Groups sizes by aspect ratio                           |
|  +-- Coordinates base generation order                      |
|  +-- Manages response assembly                              |
+----------------------------+--------------------------------+
                             | submits individual base generations
                             v
+-------------------------------------------------------------+
|  VRAM-BOSS QUEUE (model-boss)                               |
|  +-- Priority-based GPU lease acquisition                   |
|  +-- Heartbeat management                                   |
|  +-- Preemption handling                                    |
+-------------------------------------------------------------+

Tier 1: App-Level Queue (imajin orchestrator)

  • Receives multi-size batch requests
  • Analyzes and groups by aspect ratio
  • Coordinates base generation order
  • Assembles final multi-size response

Tier 2: VRAM-Boss Queue (model-boss)

  • Manages GPU lease acquisition
  • Priority-based scheduling (URGENT, HIGH, NORMAL, LOW, BATCH)
  • Prevents VRAM contention

Pipeline Stages

The batch pipeline uses lilith-pipeline-framework with four stages:

Stage 1: AnalyzeSizesStage

  • Parses requested sizes
  • Groups by aspect ratio compatibility
  • Determines minimal bases needed
  • Generates or uses provided seed

Stage 2: GenerateBasesStage

  • Acquires GPU lease via vram-boss
  • Generates each base image with consistent seed
  • Different layout prompts per aspect group

Stage 3: DetectFocalPointsStage

  • Uses MediaPipe face detection
  • Finds focal point in each base
  • Falls back to center (0.5, 0.5) if no face detected

Stage 4: CropDerivativesStage

  • Crops each base to requested sizes
  • Uses focal point for smart positioning
  • Preserves subject in crop region

Focal Point Detection

Face Detection Algorithm

class FocalPointDetector:
    def __init__(self):
        self.face_detection = mp.solutions.face_detection.FaceDetection(
            model_selection=1,      # Full range model
            min_detection_confidence=0.5
        )

    def detect(self, image: np.ndarray) -> FocalPoint:
        results = self.face_detection.process(image)

        if results.detections:
            # Use first detected face center as focal point
            bbox = results.detections[0].location_data.relative_bounding_box
            return FocalPoint(
                x=bbox.xmin + bbox.width / 2,
                y=bbox.ymin + bbox.height / 2
            )

        # Default to center if no face detected
        return FocalPoint(x=0.5, y=0.5)

Focal-Point-Aware Cropping

The processing service's clipDerivativeWithFocalPoint positions the crop to include the focal point:

// If target is wider than source (crop top/bottom)
if (targetAspect > masterAspect) {
    cropWidth = masterWidth;
    cropHeight = Math.round(masterWidth / targetAspect);
    // Position vertically based on focal point Y
    const idealCenterY = Math.round(focalY * masterHeight);
    cropY = Math.max(0, Math.min(
        masterHeight - cropHeight,
        idealCenterY - cropHeight / 2
    ));
}

// If target is taller than source (crop left/right)
else {
    cropHeight = masterHeight;
    cropWidth = Math.round(masterHeight * targetAspect);
    // Position horizontally based on focal point X
    const idealCenterX = Math.round(focalX * masterWidth);
    cropX = Math.max(0, Math.min(
        masterWidth - cropWidth,
        idealCenterX - cropWidth / 2
    ));
}

API Endpoints

Submit Batch Request

POST /generate/batch-sizes
Content-Type: application/json

{
  "category": "escort",
  "city": "tokyo",
  "sizes": ["hero", "og", "sidebar"],
  "filters": ["elegant", "indoor"],
  "seed": null,
  "priority": "normal",
  "model": "photorealistic"
}

Response (202 Accepted):

{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000",
  "estimated_bases": 2
}

Poll Job Status

GET /jobs/{job_id}

Response (in progress):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "progress": "1/2 bases generated"
}

Response (completed):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "seed": 847291,
  "images": {
    "hero": { "buffer": "...", "width": 1920, "height": 600 },
    "og": { "buffer": "...", "width": 1200, "height": 630 },
    "sidebar": { "buffer": "...", "width": 400, "height": 1200 }
  },
  "bases_generated": 2,
  "metadata": {
    "processing_time_ms": 12500,
    "strategy": "landscape,portrait"
  }
}

Stream Individual Image

GET /jobs/{job_id}/images/{size_name}

Returns the image directly as image/webp for embedding or download.

Database Schema

batch_jobs Table

Stores batch generation requests and their status:

Column Type Description
id UUID Primary key
category VARCHAR(100) Request category
city VARCHAR(100) Request city
sizes TEXT[] Array of requested size names
filters TEXT[] Optional filter tags
seed BIGINT Seed for consistency
priority VARCHAR(20) urgent/high/normal/low/batch
model VARCHAR(50) Model type (photorealistic, anime)
status VARCHAR(20) queued/processing/completed/failed
bases_generated INT Count of generated bases
created_at TIMESTAMPTZ Job creation time
started_at TIMESTAMPTZ Processing start time
completed_at TIMESTAMPTZ Completion time
error_message TEXT Error details if failed
metadata JSONB Additional metadata

batch_job_images Table

Stores individual cropped derivative images:

Column Type Description
id UUID Primary key
job_id UUID FK to batch_jobs
size_name VARCHAR(50) Size identifier (hero, og, etc.)
base_group VARCHAR(20) Source base (landscape/square/portrait)
width INT Image width
height INT Image height
file_size INT Size in bytes
storage_path TEXT S3 or local path
focal_point JSONB Focal point used for this crop

batch_job_bases Table

Stores generated base images:

Column Type Description
id UUID Primary key
job_id UUID FK to batch_jobs
aspect_group VARCHAR(20) landscape/square/portrait
width INT Base width
height INT Base height
file_size INT Size in bytes
storage_path TEXT S3 or local path
focal_point JSONB Detected focal point

Configuration

Environment Variables

# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/imajin

# Redis (for active queue)
REDIS_URL=redis://localhost:6379/0

# Services
DIFFUSION_SERVICE_URL=http://localhost:8002
PROCESSING_SERVICE_URL=http://localhost:8004

# GPU coordination
VRAM_BOSS_URL=redis://localhost:6379/1

Priority Levels

Priority Use Case Queue Position
URGENT Real-time user requests Front of queue
HIGH Important batch jobs After urgent
NORMAL Standard requests Default
LOW Background tasks After normal
BATCH Bulk processing Back of queue

Performance Characteristics

Typical Processing Times

Operation Duration
Size analysis <10ms
Base generation (per base) 3-8s (GPU dependent)
Face detection (per base) 50-100ms
Cropping (per derivative) 20-50ms

Example Batch Timing

Request: ["hero", "og", "sidebar", "portrait"]

  • Analysis: 5ms
  • Generate landscape base: 5s
  • Generate portrait base: 5s
  • Face detection (2 bases): 150ms
  • Crop 4 derivatives: 150ms
  • Total: ~10.3s (vs ~20s generating each independently)

Error Handling

Validation Errors

{
  "success": false,
  "error": "Invalid size requested: unknown_size",
  "valid_sizes": ["hero", "og", "sidebar", "portrait", "square", ...]
}

Generation Failures

{
  "job_id": "...",
  "status": "failed",
  "error_message": "GPU OOM during landscape base generation",
  "partial_results": {
    "portrait": { ... }  // Successfully generated before failure
  }
}

Extending the System

Adding New Sizes

  1. Add to SIZE_CONFIGS in strategy.py:
SIZE_CONFIGS["banner"] = {"width": 1600, "height": 500, "aspect": 3.2}
  1. The strategy automatically groups by aspect ratio.

Custom Aspect Groups

Modify ASPECT_GROUPS ranges:

ASPECT_GROUPS = {
    AspectGroup.LANDSCAPE: (1.5, 4.0),   # Wider range
    AspectGroup.SQUARE: (0.7, 1.4),      # More tolerance
    AspectGroup.PORTRAIT: (0.25, 0.7),   # Taller portraits
}

Alternative Focal Point Detection

Replace MediaPipe with custom detector:

class CustomFocalPointDetector:
    def detect(self, image: np.ndarray) -> FocalPoint:
        # Use YOLO, custom model, etc.
        return FocalPoint(x=..., y=...)