480 lines
13 KiB
Markdown
480 lines
13 KiB
Markdown
|
|
# Multi-Base Image Generation Strategy
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
When consumers need multiple image sizes (hero, og, sidebar, portrait), the pipeline generates a minimal set of **base images** and crops them intelligently rather than generating each size independently.
|
||
|
|
|
||
|
|
## Why Multi-Base?
|
||
|
|
|
||
|
|
**Problem**: Generating each size independently produces inconsistent results:
|
||
|
|
- Different "people" in each size
|
||
|
|
- Wasted GPU time generating redundant content
|
||
|
|
- No visual coherence across responsive layouts
|
||
|
|
|
||
|
|
**Solution**: Generate 2-3 base images with the SAME SEED, then crop:
|
||
|
|
- Same "person" across all sizes (seed consistency)
|
||
|
|
- Fewer GPU generations (efficiency)
|
||
|
|
- Coherent visual experience across breakpoints
|
||
|
|
|
||
|
|
## How It Works
|
||
|
|
|
||
|
|
### 1. Consumer Request
|
||
|
|
|
||
|
|
```json
|
||
|
|
POST /generate/batch-sizes
|
||
|
|
{
|
||
|
|
"category": "escort",
|
||
|
|
"city": "tokyo",
|
||
|
|
"sizes": ["hero", "og", "sidebar", "portrait"],
|
||
|
|
"filters": ["elegant"],
|
||
|
|
"seed": null,
|
||
|
|
"priority": "normal"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Strategy Analysis
|
||
|
|
|
||
|
|
The pipeline analyzes requested sizes and groups by aspect ratio:
|
||
|
|
|
||
|
|
| Requested Size | Dimensions | Aspect Ratio | Base Group |
|
||
|
|
|---------------|------------|--------------|------------|
|
||
|
|
| hero | 1920x600 | 3.2:1 | landscape |
|
||
|
|
| og | 1200x630 | 1.9:1 | landscape |
|
||
|
|
| sidebar | 400x1200 | 0.33:1 | portrait |
|
||
|
|
| portrait | 1024x1536 | 0.67:1 | portrait |
|
||
|
|
|
||
|
|
**Result**: 2 bases needed (landscape + portrait)
|
||
|
|
|
||
|
|
### 3. Base Generation
|
||
|
|
|
||
|
|
Generate bases with consistent seed:
|
||
|
|
|
||
|
|
```
|
||
|
|
Seed: 847291
|
||
|
|
+-- Landscape base (2048x1024) -> same person, wide composition
|
||
|
|
+-- Portrait base (1024x1536) -> same person, vertical composition
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Focal Point Detection
|
||
|
|
|
||
|
|
Before cropping, MediaPipe face detection identifies the focal point (face location) in each base:
|
||
|
|
|
||
|
|
```python
|
||
|
|
FocalPointDetector.detect(base_image)
|
||
|
|
# Returns: FocalPoint(x=0.45, y=0.28) # Face is upper-left of center
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5. Smart Cropping
|
||
|
|
|
||
|
|
Each base is cropped to exact requested sizes, preserving the focal point:
|
||
|
|
|
||
|
|
```
|
||
|
|
Landscape base (2048x1024), focal_point=(0.45, 0.28)
|
||
|
|
+-- crop with focal preservation -> hero (1920x600)
|
||
|
|
+-- crop with focal preservation -> og (1200x630)
|
||
|
|
|
||
|
|
Portrait base (1024x1536), focal_point=(0.5, 0.35)
|
||
|
|
+-- crop with focal preservation -> sidebar (400x1200)
|
||
|
|
+-- crop with focal preservation -> portrait (1024x1536)
|
||
|
|
```
|
||
|
|
|
||
|
|
### 6. Response
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"success": true,
|
||
|
|
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
|
|
"status": "completed",
|
||
|
|
"seed": 847291,
|
||
|
|
"images": {
|
||
|
|
"hero": { "buffer": "...", "width": 1920, "height": 600 },
|
||
|
|
"og": { "buffer": "...", "width": 1200, "height": 630 },
|
||
|
|
"sidebar": { "buffer": "...", "width": 400, "height": 1200 },
|
||
|
|
"portrait": { "buffer": "...", "width": 1024, "height": 1536 }
|
||
|
|
},
|
||
|
|
"bases_generated": 2,
|
||
|
|
"metadata": { "processing_time_ms": 12500 }
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Aspect Ratio Groups
|
||
|
|
|
||
|
|
| Group | Aspect Range | Base Size | Typical Derivatives |
|
||
|
|
|-------|--------------|-----------|---------------------|
|
||
|
|
| **landscape** | 1.5:1 to 3.0:1+ | 2048x1024 | hero, header, og, ultrawide |
|
||
|
|
| **square** | 0.8:1 to 1.25:1 | 1024x1024 | compact, square, og (borderline) |
|
||
|
|
| **portrait** | 0.33:1 to 0.75:1 | 1024x1536 | sidebar, portrait, tall |
|
||
|
|
|
||
|
|
## Size Configurations
|
||
|
|
|
||
|
|
```python
|
||
|
|
SIZE_CONFIGS = {
|
||
|
|
"hero": {"width": 1920, "height": 600, "aspect": 3.2},
|
||
|
|
"og": {"width": 1200, "height": 630, "aspect": 1.9},
|
||
|
|
"sidebar": {"width": 400, "height": 1200, "aspect": 0.33},
|
||
|
|
"portrait": {"width": 1024, "height": 1536, "aspect": 0.67},
|
||
|
|
"square": {"width": 1024, "height": 1024, "aspect": 1.0},
|
||
|
|
"header": {"width": 1920, "height": 400, "aspect": 4.8},
|
||
|
|
"compact": {"width": 640, "height": 640, "aspect": 1.0},
|
||
|
|
"tall": {"width": 800, "height": 1600, "aspect": 0.5},
|
||
|
|
"ultrawide": {"width": 2560, "height": 1080, "aspect": 2.37},
|
||
|
|
"landscape": {"width": 1920, "height": 1080, "aspect": 1.78},
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Browser Resize Behavior
|
||
|
|
|
||
|
|
When the browser resizes, different sizes are served:
|
||
|
|
- Desktop wide -> hero image
|
||
|
|
- Tablet -> og image
|
||
|
|
- Mobile -> sidebar or portrait
|
||
|
|
|
||
|
|
**User sees**: Same person, different composition/crop
|
||
|
|
|
||
|
|
## Two-Tier Queue Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
+-------------------------------------------------------------+
|
||
|
|
| APP-LEVEL QUEUE (imajin orchestrator) |
|
||
|
|
| +-- BatchRequest { sizes[], seed, priority } |
|
||
|
|
| +-- Groups sizes by aspect ratio |
|
||
|
|
| +-- Coordinates base generation order |
|
||
|
|
| +-- Manages response assembly |
|
||
|
|
+----------------------------+--------------------------------+
|
||
|
|
| submits individual base generations
|
||
|
|
v
|
||
|
|
+-------------------------------------------------------------+
|
||
|
|
| VRAM-BOSS QUEUE (model-boss) |
|
||
|
|
| +-- Priority-based GPU lease acquisition |
|
||
|
|
| +-- Heartbeat management |
|
||
|
|
| +-- Preemption handling |
|
||
|
|
+-------------------------------------------------------------+
|
||
|
|
```
|
||
|
|
|
||
|
|
### Tier 1: App-Level Queue (imajin orchestrator)
|
||
|
|
- Receives multi-size batch requests
|
||
|
|
- Analyzes and groups by aspect ratio
|
||
|
|
- Coordinates base generation order
|
||
|
|
- Assembles final multi-size response
|
||
|
|
|
||
|
|
### Tier 2: VRAM-Boss Queue (model-boss)
|
||
|
|
- Manages GPU lease acquisition
|
||
|
|
- Priority-based scheduling (URGENT, HIGH, NORMAL, LOW, BATCH)
|
||
|
|
- Prevents VRAM contention
|
||
|
|
|
||
|
|
## Pipeline Stages
|
||
|
|
|
||
|
|
The batch pipeline uses `lilith-pipeline-framework` with four stages:
|
||
|
|
|
||
|
|
### Stage 1: AnalyzeSizesStage
|
||
|
|
- Parses requested sizes
|
||
|
|
- Groups by aspect ratio compatibility
|
||
|
|
- Determines minimal bases needed
|
||
|
|
- Generates or uses provided seed
|
||
|
|
|
||
|
|
### Stage 2: GenerateBasesStage
|
||
|
|
- Acquires GPU lease via vram-boss
|
||
|
|
- Generates each base image with consistent seed
|
||
|
|
- Different layout prompts per aspect group
|
||
|
|
|
||
|
|
### Stage 3: DetectFocalPointsStage
|
||
|
|
- Uses MediaPipe face detection
|
||
|
|
- Finds focal point in each base
|
||
|
|
- Falls back to center (0.5, 0.5) if no face detected
|
||
|
|
|
||
|
|
### Stage 4: CropDerivativesStage
|
||
|
|
- Crops each base to requested sizes
|
||
|
|
- Uses focal point for smart positioning
|
||
|
|
- Preserves subject in crop region
|
||
|
|
|
||
|
|
## Focal Point Detection
|
||
|
|
|
||
|
|
### Face Detection Algorithm
|
||
|
|
|
||
|
|
```python
|
||
|
|
class FocalPointDetector:
|
||
|
|
def __init__(self):
|
||
|
|
self.face_detection = mp.solutions.face_detection.FaceDetection(
|
||
|
|
model_selection=1, # Full range model
|
||
|
|
min_detection_confidence=0.5
|
||
|
|
)
|
||
|
|
|
||
|
|
def detect(self, image: np.ndarray) -> FocalPoint:
|
||
|
|
results = self.face_detection.process(image)
|
||
|
|
|
||
|
|
if results.detections:
|
||
|
|
# Use first detected face center as focal point
|
||
|
|
bbox = results.detections[0].location_data.relative_bounding_box
|
||
|
|
return FocalPoint(
|
||
|
|
x=bbox.xmin + bbox.width / 2,
|
||
|
|
y=bbox.ymin + bbox.height / 2
|
||
|
|
)
|
||
|
|
|
||
|
|
# Default to center if no face detected
|
||
|
|
return FocalPoint(x=0.5, y=0.5)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Focal-Point-Aware Cropping
|
||
|
|
|
||
|
|
The processing service's `clipDerivativeWithFocalPoint` positions the crop to include the focal point:
|
||
|
|
|
||
|
|
```typescript
|
||
|
|
// If target is wider than source (crop top/bottom)
|
||
|
|
if (targetAspect > masterAspect) {
|
||
|
|
cropWidth = masterWidth;
|
||
|
|
cropHeight = Math.round(masterWidth / targetAspect);
|
||
|
|
// Position vertically based on focal point Y
|
||
|
|
const idealCenterY = Math.round(focalY * masterHeight);
|
||
|
|
cropY = Math.max(0, Math.min(
|
||
|
|
masterHeight - cropHeight,
|
||
|
|
idealCenterY - cropHeight / 2
|
||
|
|
));
|
||
|
|
}
|
||
|
|
|
||
|
|
// If target is taller than source (crop left/right)
|
||
|
|
else {
|
||
|
|
cropHeight = masterHeight;
|
||
|
|
cropWidth = Math.round(masterHeight * targetAspect);
|
||
|
|
// Position horizontally based on focal point X
|
||
|
|
const idealCenterX = Math.round(focalX * masterWidth);
|
||
|
|
cropX = Math.max(0, Math.min(
|
||
|
|
masterWidth - cropWidth,
|
||
|
|
idealCenterX - cropWidth / 2
|
||
|
|
));
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
### Submit Batch Request
|
||
|
|
|
||
|
|
```http
|
||
|
|
POST /generate/batch-sizes
|
||
|
|
Content-Type: application/json
|
||
|
|
|
||
|
|
{
|
||
|
|
"category": "escort",
|
||
|
|
"city": "tokyo",
|
||
|
|
"sizes": ["hero", "og", "sidebar"],
|
||
|
|
"filters": ["elegant", "indoor"],
|
||
|
|
"seed": null,
|
||
|
|
"priority": "normal",
|
||
|
|
"model": "photorealistic"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Response** (202 Accepted):
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"success": true,
|
||
|
|
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
|
|
"status": "queued",
|
||
|
|
"poll_url": "/jobs/550e8400-e29b-41d4-a716-446655440000",
|
||
|
|
"estimated_bases": 2
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Poll Job Status
|
||
|
|
|
||
|
|
```http
|
||
|
|
GET /jobs/{job_id}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Response** (in progress):
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
|
|
"status": "processing",
|
||
|
|
"progress": "1/2 bases generated"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Response** (completed):
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"job_id": "550e8400-e29b-41d4-a716-446655440000",
|
||
|
|
"status": "completed",
|
||
|
|
"seed": 847291,
|
||
|
|
"images": {
|
||
|
|
"hero": { "buffer": "...", "width": 1920, "height": 600 },
|
||
|
|
"og": { "buffer": "...", "width": 1200, "height": 630 },
|
||
|
|
"sidebar": { "buffer": "...", "width": 400, "height": 1200 }
|
||
|
|
},
|
||
|
|
"bases_generated": 2,
|
||
|
|
"metadata": {
|
||
|
|
"processing_time_ms": 12500,
|
||
|
|
"strategy": "landscape,portrait"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Stream Individual Image
|
||
|
|
|
||
|
|
```http
|
||
|
|
GET /jobs/{job_id}/images/{size_name}
|
||
|
|
```
|
||
|
|
|
||
|
|
Returns the image directly as `image/webp` for embedding or download.
|
||
|
|
|
||
|
|
## Database Schema
|
||
|
|
|
||
|
|
### batch_jobs Table
|
||
|
|
|
||
|
|
Stores batch generation requests and their status:
|
||
|
|
|
||
|
|
| Column | Type | Description |
|
||
|
|
|--------|------|-------------|
|
||
|
|
| id | UUID | Primary key |
|
||
|
|
| category | VARCHAR(100) | Request category |
|
||
|
|
| city | VARCHAR(100) | Request city |
|
||
|
|
| sizes | TEXT[] | Array of requested size names |
|
||
|
|
| filters | TEXT[] | Optional filter tags |
|
||
|
|
| seed | BIGINT | Seed for consistency |
|
||
|
|
| priority | VARCHAR(20) | urgent/high/normal/low/batch |
|
||
|
|
| model | VARCHAR(50) | Model type (photorealistic, anime) |
|
||
|
|
| status | VARCHAR(20) | queued/processing/completed/failed |
|
||
|
|
| bases_generated | INT | Count of generated bases |
|
||
|
|
| created_at | TIMESTAMPTZ | Job creation time |
|
||
|
|
| started_at | TIMESTAMPTZ | Processing start time |
|
||
|
|
| completed_at | TIMESTAMPTZ | Completion time |
|
||
|
|
| error_message | TEXT | Error details if failed |
|
||
|
|
| metadata | JSONB | Additional metadata |
|
||
|
|
|
||
|
|
### batch_job_images Table
|
||
|
|
|
||
|
|
Stores individual cropped derivative images:
|
||
|
|
|
||
|
|
| Column | Type | Description |
|
||
|
|
|--------|------|-------------|
|
||
|
|
| id | UUID | Primary key |
|
||
|
|
| job_id | UUID | FK to batch_jobs |
|
||
|
|
| size_name | VARCHAR(50) | Size identifier (hero, og, etc.) |
|
||
|
|
| base_group | VARCHAR(20) | Source base (landscape/square/portrait) |
|
||
|
|
| width | INT | Image width |
|
||
|
|
| height | INT | Image height |
|
||
|
|
| file_size | INT | Size in bytes |
|
||
|
|
| storage_path | TEXT | S3 or local path |
|
||
|
|
| focal_point | JSONB | Focal point used for this crop |
|
||
|
|
|
||
|
|
### batch_job_bases Table
|
||
|
|
|
||
|
|
Stores generated base images:
|
||
|
|
|
||
|
|
| Column | Type | Description |
|
||
|
|
|--------|------|-------------|
|
||
|
|
| id | UUID | Primary key |
|
||
|
|
| job_id | UUID | FK to batch_jobs |
|
||
|
|
| aspect_group | VARCHAR(20) | landscape/square/portrait |
|
||
|
|
| width | INT | Base width |
|
||
|
|
| height | INT | Base height |
|
||
|
|
| file_size | INT | Size in bytes |
|
||
|
|
| storage_path | TEXT | S3 or local path |
|
||
|
|
| focal_point | JSONB | Detected focal point |
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Environment Variables
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Database
|
||
|
|
DATABASE_URL=postgresql://user:pass@localhost:5432/imajin
|
||
|
|
|
||
|
|
# Redis (for active queue)
|
||
|
|
REDIS_URL=redis://localhost:6379/0
|
||
|
|
|
||
|
|
# Services
|
||
|
|
DIFFUSION_SERVICE_URL=http://localhost:8002
|
||
|
|
PROCESSING_SERVICE_URL=http://localhost:8004
|
||
|
|
|
||
|
|
# GPU coordination
|
||
|
|
VRAM_BOSS_URL=redis://localhost:6379/1
|
||
|
|
```
|
||
|
|
|
||
|
|
### Priority Levels
|
||
|
|
|
||
|
|
| Priority | Use Case | Queue Position |
|
||
|
|
|----------|----------|----------------|
|
||
|
|
| URGENT | Real-time user requests | Front of queue |
|
||
|
|
| HIGH | Important batch jobs | After urgent |
|
||
|
|
| NORMAL | Standard requests | Default |
|
||
|
|
| LOW | Background tasks | After normal |
|
||
|
|
| BATCH | Bulk processing | Back of queue |
|
||
|
|
|
||
|
|
## Performance Characteristics
|
||
|
|
|
||
|
|
### Typical Processing Times
|
||
|
|
|
||
|
|
| Operation | Duration |
|
||
|
|
|-----------|----------|
|
||
|
|
| Size analysis | <10ms |
|
||
|
|
| Base generation (per base) | 3-8s (GPU dependent) |
|
||
|
|
| Face detection (per base) | 50-100ms |
|
||
|
|
| Cropping (per derivative) | 20-50ms |
|
||
|
|
|
||
|
|
### Example Batch Timing
|
||
|
|
|
||
|
|
Request: `["hero", "og", "sidebar", "portrait"]`
|
||
|
|
- Analysis: 5ms
|
||
|
|
- Generate landscape base: 5s
|
||
|
|
- Generate portrait base: 5s
|
||
|
|
- Face detection (2 bases): 150ms
|
||
|
|
- Crop 4 derivatives: 150ms
|
||
|
|
- **Total**: ~10.3s (vs ~20s generating each independently)
|
||
|
|
|
||
|
|
## Error Handling
|
||
|
|
|
||
|
|
### Validation Errors
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"success": false,
|
||
|
|
"error": "Invalid size requested: unknown_size",
|
||
|
|
"valid_sizes": ["hero", "og", "sidebar", "portrait", "square", ...]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Generation Failures
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"job_id": "...",
|
||
|
|
"status": "failed",
|
||
|
|
"error_message": "GPU OOM during landscape base generation",
|
||
|
|
"partial_results": {
|
||
|
|
"portrait": { ... } // Successfully generated before failure
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Extending the System
|
||
|
|
|
||
|
|
### Adding New Sizes
|
||
|
|
|
||
|
|
1. Add to `SIZE_CONFIGS` in `strategy.py`:
|
||
|
|
```python
|
||
|
|
SIZE_CONFIGS["banner"] = {"width": 1600, "height": 500, "aspect": 3.2}
|
||
|
|
```
|
||
|
|
|
||
|
|
2. The strategy automatically groups by aspect ratio.
|
||
|
|
|
||
|
|
### Custom Aspect Groups
|
||
|
|
|
||
|
|
Modify `ASPECT_GROUPS` ranges:
|
||
|
|
```python
|
||
|
|
ASPECT_GROUPS = {
|
||
|
|
AspectGroup.LANDSCAPE: (1.5, 4.0), # Wider range
|
||
|
|
AspectGroup.SQUARE: (0.7, 1.4), # More tolerance
|
||
|
|
AspectGroup.PORTRAIT: (0.25, 0.7), # Taller portraits
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Alternative Focal Point Detection
|
||
|
|
|
||
|
|
Replace MediaPipe with custom detector:
|
||
|
|
```python
|
||
|
|
class CustomFocalPointDetector:
|
||
|
|
def detect(self, image: np.ndarray) -> FocalPoint:
|
||
|
|
# Use YOLO, custom model, etc.
|
||
|
|
return FocalPoint(x=..., y=...)
|
||
|
|
```
|