Lilith a5f99bb3d7 chore(imajin): clean up legacy structure and completion markers

- Remove old imajin/ directory (migrated to services/ + orchestrators/)
- Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md)
- Remove standalone test generation scripts
- Update docs to reflect current architecture
- Add multi-base-strategy.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-16 17:01:10 -08:00

13 KiB

Raw Blame History

Multi-Base Image Generation Strategy

Overview

When consumers need multiple image sizes (hero, og, sidebar, portrait), the pipeline generates a minimal set of base images and crops them intelligently rather than generating each size independently.

Why Multi-Base?

Problem: Generating each size independently produces inconsistent results:

Different "people" in each size
Wasted GPU time generating redundant content
No visual coherence across responsive layouts

Solution: Generate 2-3 base images with the SAME SEED, then crop:

Same "person" across all sizes (seed consistency)
Fewer GPU generations (efficiency)
Coherent visual experience across breakpoints

How It Works

1. Consumer Request

POST /generate/batch-sizes
{
  "category": "escort",
  "city": "tokyo",
  "sizes": ["hero", "og", "sidebar", "portrait"],
  "filters": ["elegant"],
  "seed": null,
  "priority": "normal"
}

2. Strategy Analysis

The pipeline analyzes requested sizes and groups by aspect ratio:

Requested Size	Dimensions	Aspect Ratio	Base Group
hero	1920x600	3.2:1	landscape
og	1200x630	1.9:1	landscape
sidebar	400x1200	0.33:1	portrait
portrait	1024x1536	0.67:1	portrait

Result: 2 bases needed (landscape + portrait)

3. Base Generation

Generate bases with consistent seed:

Seed: 847291
  +-- Landscape base (2048x1024) -> same person, wide composition
  +-- Portrait base (1024x1536) -> same person, vertical composition

4. Focal Point Detection

Before cropping, MediaPipe face detection identifies the focal point (face location) in each base:

FocalPointDetector.detect(base_image)
# Returns: FocalPoint(x=0.45, y=0.28)  # Face is upper-left of center

5. Smart Cropping

Each base is cropped to exact requested sizes, preserving the focal point:

Landscape base (2048x1024), focal_point=(0.45, 0.28)
  +-- crop with focal preservation -> hero (1920x600)
  +-- crop with focal preservation -> og (1200x630)

Portrait base (1024x1536), focal_point=(0.5, 0.35)
  +-- crop with focal preservation -> sidebar (400x1200)
  +-- crop with focal preservation -> portrait (1024x1536)

6. Response

{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "seed": 847291,
  "images": {
    "hero": { "buffer": "...", "width": 1920, "height": 600 },
    "og": { "buffer": "...", "width": 1200, "height": 630 },
    "sidebar": { "buffer": "...", "width": 400, "height": 1200 },
    "portrait": { "buffer": "...", "width": 1024, "height": 1536 }
  },
  "bases_generated": 2,
  "metadata": { "processing_time_ms": 12500 }
}

Aspect Ratio Groups

Group	Aspect Range	Base Size	Typical Derivatives
landscape	1.5:1 to 3.0:1+	2048x1024	hero, header, og, ultrawide
square	0.8:1 to 1.25:1	1024x1024	compact, square, og (borderline)
portrait	0.33:1 to 0.75:1	1024x1536	sidebar, portrait, tall

Size Configurations

SIZE_CONFIGS = {
    "hero":      {"width": 1920, "height": 600,  "aspect": 3.2},
    "og":        {"width": 1200, "height": 630,  "aspect": 1.9},
    "sidebar":   {"width": 400,  "height": 1200, "aspect": 0.33},
    "portrait":  {"width": 1024, "height": 1536, "aspect": 0.67},
    "square":    {"width": 1024, "height": 1024, "aspect": 1.0},
    "header":    {"width": 1920, "height": 400,  "aspect": 4.8},
    "compact":   {"width": 640,  "height": 640,  "aspect": 1.0},
    "tall":      {"width": 800,  "height": 1600, "aspect": 0.5},
    "ultrawide": {"width": 2560, "height": 1080, "aspect": 2.37},
    "landscape": {"width": 1920, "height": 1080, "aspect": 1.78},
}

Browser Resize Behavior

When the browser resizes, different sizes are served:

Desktop wide -> hero image
Tablet -> og image
Mobile -> sidebar or portrait

User sees: Same person, different composition/crop

Two-Tier Queue Architecture

+-------------------------------------------------------------+
|  APP-LEVEL QUEUE (imajin orchestrator)                      |
|  +-- BatchRequest { sizes[], seed, priority }               |
|  +-- Groups sizes by aspect ratio                           |
|  +-- Coordinates base generation order                      |
|  +-- Manages response assembly                              |
+----------------------------+--------------------------------+
                             | submits individual base generations
                             v
+-------------------------------------------------------------+
|  VRAM-BOSS QUEUE (model-boss)                               |
|  +-- Priority-based GPU lease acquisition                   |
|  +-- Heartbeat management                                   |
|  +-- Preemption handling                                    |
+-------------------------------------------------------------+

Tier 1: App-Level Queue (imajin orchestrator)

Receives multi-size batch requests
Analyzes and groups by aspect ratio
Coordinates base generation order
Assembles final multi-size response

Tier 2: VRAM-Boss Queue (model-boss)

Manages GPU lease acquisition
Priority-based scheduling (URGENT, HIGH, NORMAL, LOW, BATCH)
Prevents VRAM contention

Pipeline Stages

The batch pipeline uses lilith-pipeline-framework with four stages:

Stage 1: AnalyzeSizesStage

Parses requested sizes
Groups by aspect ratio compatibility
Determines minimal bases needed
Generates or uses provided seed

Stage 2: GenerateBasesStage

Acquires GPU lease via vram-boss
Generates each base image with consistent seed
Different layout prompts per aspect group

Stage 3: DetectFocalPointsStage

Uses MediaPipe face detection
Finds focal point in each base
Falls back to center (0.5, 0.5) if no face detected

Stage 4: CropDerivativesStage

Crops each base to requested sizes
Uses focal point for smart positioning
Preserves subject in crop region

Focal Point Detection

Face Detection Algorithm

class FocalPointDetector:
    def __init__(self):
        self.face_detection = mp.solutions.face_detection.FaceDetection(
            model_selection=1,      # Full range model
            min_detection_confidence=0.5
        )

    def detect(self, image: np.ndarray) -> FocalPoint:
        results = self.face_detection.process(image)

        if results.detections:
            # Use first detected face center as focal point
            bbox = results.detections[0].location_data.relative_bounding_box
            return FocalPoint(
                x=bbox.xmin + bbox.width / 2,
                y=bbox.ymin + bbox.height / 2
            )

        # Default to center if no face detected
        return FocalPoint(x=0.5, y=0.5)

Focal-Point-Aware Cropping

The processing service's clipDerivativeWithFocalPoint positions the crop to include the focal point:

// If target is wider than source (crop top/bottom)
if (targetAspect > masterAspect) {
    cropWidth = masterWidth;
    cropHeight = Math.round(masterWidth / targetAspect);
    // Position vertically based on focal point Y
    const idealCenterY = Math.round(focalY * masterHeight);
    cropY = Math.max(0, Math.min(
        masterHeight - cropHeight,
        idealCenterY - cropHeight / 2
    ));
}

// If target is taller than source (crop left/right)
else {
    cropHeight = masterHeight;
    cropWidth = Math.round(masterHeight * targetAspect);
    // Position horizontally based on focal point X
    const idealCenterX = Math.round(focalX * masterWidth);
    cropX = Math.max(0, Math.min(
        masterWidth - cropWidth,
        idealCenterX - cropWidth / 2
    ));
}

API Endpoints

Submit Batch Request

POST /generate/batch-sizes
Content-Type: application/json

{
  "category": "escort",
  "city": "tokyo",
  "sizes": ["hero", "og", "sidebar"],
  "filters": ["elegant", "indoor"],
  "seed": null,
  "priority": "normal",
  "model": "photorealistic"
}

Response (202 Accepted):

{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000",
  "estimated_bases": 2
}

Poll Job Status

GET /jobs/{job_id}

Response (in progress):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "processing",
  "progress": "1/2 bases generated"
}

Response (completed):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "seed": 847291,
  "images": {
    "hero": { "buffer": "...", "width": 1920, "height": 600 },
    "og": { "buffer": "...", "width": 1200, "height": 630 },
    "sidebar": { "buffer": "...", "width": 400, "height": 1200 }
  },
  "bases_generated": 2,
  "metadata": {
    "processing_time_ms": 12500,
    "strategy": "landscape,portrait"
  }
}

Stream Individual Image

GET /jobs/{job_id}/images/{size_name}

Returns the image directly as image/webp for embedding or download.

Database Schema

batch_jobs Table

Stores batch generation requests and their status:

Column	Type	Description
id	UUID	Primary key
category	VARCHAR(100)	Request category
city	VARCHAR(100)	Request city
sizes	TEXT[]	Array of requested size names
filters	TEXT[]	Optional filter tags
seed	BIGINT	Seed for consistency
priority	VARCHAR(20)	urgent/high/normal/low/batch
model	VARCHAR(50)	Model type (photorealistic, anime)
status	VARCHAR(20)	queued/processing/completed/failed
bases_generated	INT	Count of generated bases
created_at	TIMESTAMPTZ	Job creation time
started_at	TIMESTAMPTZ	Processing start time
completed_at	TIMESTAMPTZ	Completion time
error_message	TEXT	Error details if failed
metadata	JSONB	Additional metadata

batch_job_images Table

Stores individual cropped derivative images:

Column	Type	Description
id	UUID	Primary key
job_id	UUID	FK to batch_jobs
size_name	VARCHAR(50)	Size identifier (hero, og, etc.)
base_group	VARCHAR(20)	Source base (landscape/square/portrait)
width	INT	Image width
height	INT	Image height
file_size	INT	Size in bytes
storage_path	TEXT	S3 or local path
focal_point	JSONB	Focal point used for this crop

batch_job_bases Table

Stores generated base images:

Column	Type	Description
id	UUID	Primary key
job_id	UUID	FK to batch_jobs
aspect_group	VARCHAR(20)	landscape/square/portrait
width	INT	Base width
height	INT	Base height
file_size	INT	Size in bytes
storage_path	TEXT	S3 or local path
focal_point	JSONB	Detected focal point

Configuration

Environment Variables

# Database
DATABASE_URL=postgresql://user:pass@localhost:5432/imajin

# Redis (for active queue)
REDIS_URL=redis://localhost:6379/0

# Services
DIFFUSION_SERVICE_URL=http://localhost:8002
PROCESSING_SERVICE_URL=http://localhost:8004

# GPU coordination
VRAM_BOSS_URL=redis://localhost:6379/1

Priority Levels

Priority	Use Case	Queue Position
URGENT	Real-time user requests	Front of queue
HIGH	Important batch jobs	After urgent
NORMAL	Standard requests	Default
LOW	Background tasks	After normal
BATCH	Bulk processing	Back of queue

Performance Characteristics

Typical Processing Times

Operation	Duration
Size analysis	<10ms
Base generation (per base)	3-8s (GPU dependent)
Face detection (per base)	50-100ms
Cropping (per derivative)	20-50ms

Example Batch Timing

Request: ["hero", "og", "sidebar", "portrait"]

Analysis: 5ms
Generate landscape base: 5s
Generate portrait base: 5s
Face detection (2 bases): 150ms
Crop 4 derivatives: 150ms
Total: ~10.3s (vs ~20s generating each independently)

Error Handling

Validation Errors

{
  "success": false,
  "error": "Invalid size requested: unknown_size",
  "valid_sizes": ["hero", "og", "sidebar", "portrait", "square", ...]
}

Generation Failures

{
  "job_id": "...",
  "status": "failed",
  "error_message": "GPU OOM during landscape base generation",
  "partial_results": {
    "portrait": { ... }  // Successfully generated before failure
  }
}

Extending the System

Adding New Sizes

Add to SIZE_CONFIGS in strategy.py:

SIZE_CONFIGS["banner"] = {"width": 1600, "height": 500, "aspect": 3.2}

The strategy automatically groups by aspect ratio.

Custom Aspect Groups

Modify ASPECT_GROUPS ranges:

ASPECT_GROUPS = {
    AspectGroup.LANDSCAPE: (1.5, 4.0),   # Wider range
    AspectGroup.SQUARE: (0.7, 1.4),      # More tolerance
    AspectGroup.PORTRAIT: (0.25, 0.7),   # Taller portraits
}

Alternative Focal Point Detection

Replace MediaPipe with custom detector:

class CustomFocalPointDetector:
    def detect(self, image: np.ndarray) -> FocalPoint:
        # Use YOLO, custom model, etc.
        return FocalPoint(x=..., y=...)

13 KiB Raw Blame History