49 KiB
PersonAppearance API - Comprehensive Guide
Version: 1.0 (Phase 1 Complete) Status: Production Ready (Pose Control), Beta (Clothing Control) Last Updated: 2026-01-14
Table of Contents
- Overview
- Quick Start
- API Reference
- Pose Control
- Clothing Control
- Complete Examples
- Integration Patterns
- Performance & Optimization
- Troubleshooting
- Roadmap
Overview
What is the PersonAppearance API?
The PersonAppearance API is a high-level, user-friendly interface for controlling person appearance in AI-generated images. It provides simplified access to advanced image conditioning techniques (ControlNet) without requiring deep knowledge of computer vision or model architectures.
Key Features:
- Pose Control: 4 preset poses + custom reference images + keypoint coordinates (advanced)
- Clothing Control: Body part mapping for outfit specification
- Future-Proof: Designed for expansion (facial expressions, hair styles, accessories)
- Automatic Translation: Converts high-level specifications to low-level ControlNet configurations
Why Use PersonAppearance vs Direct ControlNet?
| Aspect | PersonAppearance API (High-Level) | Direct ControlNet (Low-Level) |
|---|---|---|
| Ease of Use | Simple, intuitive parameters | Requires ControlNet knowledge |
| Pose Presets | 4 built-in poses (standing, sitting, etc.) | Must provide reference images |
| Clothing Mapping | Body part → clothing dict | Manual segmentation masks |
| Error Handling | Comprehensive validation | Manual validation required |
| Extensibility | Future features (expressions, hair) | Static capabilities |
| Use Case | Most users, rapid prototyping | Power users, fine-grained control |
When to use PersonAppearance:
- You want consistent pose/clothing control without ControlNet expertise
- You need preset poses (standing, sitting, walking, running)
- You want to specify outfits by body part (e.g., "red dress on torso")
- You're building user-facing applications
When to use Direct ControlNet:
- You need fine-grained control over conditioning scales
- You have pre-processed ControlNet images
- You want to control guidance timing (start/end percentages)
- You're building advanced ML pipelines
Two-Tier API Design
The image pipeline provides two complementary APIs:
┌─────────────────────────────────────────────────────────────┐
│ ImagePipelineRequest │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────┐ ┌──────────────────────┐ │
│ │ PersonAppearanceRequest │ │ ControlNetConfig │ │
│ │ (High-Level API) │ │ (Low-Level API) │ │
│ │ │ │ │ │
│ │ - pose_type │ │ - enable_openpose │ │
│ │ - pose_reference_image │ │ - openpose_ref_image │ │
│ │ - clothing_parts │ │ - enable_segmentation│ │
│ │ - outfit_description │ │ - segmentation_mask │ │
│ │ │ │ - conditioning_scale │ │
│ └──────────┬───────────────┘ └──────────────────────┘ │
│ │ │
│ │ Auto-Translates (via AppearanceToControlNet)│
│ └──────────────────────────────────────────────┤
│ │
│ ControlNet Conditioning │
│ (OpenPose + Segmentation) │
│ │
└─────────────────────────────────────────────────────────────┘
Translation Flow:
- User specifies
person_appearancein request ImageConditioningStageinvokesAppearanceToControlNettranslator- Translator generates appropriate
ControlNetConfig - ControlNet preprocessing runs (OpenPose extraction, mask generation)
- Generation stage uses conditioned images to guide diffusion
Priority: If both person_appearance and controlnet_config are provided, controlnet_config takes priority (power-user override).
Quick Start
Example 1: Basic Pose Preset
Generate an image with a person in a standing pose:
from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest
request = ImagePipelineRequest(
prompt="professional headshot of a woman in business attire",
model="photorealistic",
layout="portrait",
person_appearance=PersonAppearanceRequest(
pose_type="standing"
)
)
# Execute pipeline
result = await pipeline.execute(request)
What happens:
- Pipeline loads preset "standing" pose from library
- OpenPose ControlNet guides generation to match standing pose
- Person in generated image follows standing posture
Example 2: Custom Pose from Reference Image
Use your own reference image for pose control:
import base64
# Load your reference image
with open("my_reference_pose.jpg", "rb") as f:
image_bytes = f.read()
image_b64 = base64.b64encode(image_bytes).decode("utf-8")
request = ImagePipelineRequest(
prompt="athlete in sportswear",
model="photorealistic",
person_appearance=PersonAppearanceRequest(
pose_reference_image=f"data:image/jpeg;base64,{image_b64}"
)
)
What happens:
- Pipeline extracts OpenPose skeleton from your reference image
- Generated image matches the pose from your reference
- Original reference image content is NOT copied (only pose structure)
Example 3: Clothing Control Only
Specify outfit without controlling pose:
request = ImagePipelineRequest(
prompt="fashion model on runway",
model="photorealistic",
person_appearance=PersonAppearanceRequest(
clothing_parts={
"torso": "red evening gown",
"feet": "black high heels"
}
)
)
What happens:
- Pipeline generates segmentation mask with specified body parts
- Segmentation ControlNet guides clothing placement
- Model can pose naturally (no pose constraint)
Example 4: Combined Pose + Clothing
Full appearance control:
request = ImagePipelineRequest(
prompt="professional corporate portrait",
model="photorealistic",
layout="portrait",
person_appearance=PersonAppearanceRequest(
pose_type="sitting",
clothing_parts={
"torso": "navy blue blazer",
"upper_body": "white dress shirt",
"legs": "matching pants"
}
)
)
What happens:
- Both OpenPose and Segmentation ControlNets are enabled
- Person sits in preset "sitting" pose
- Clothing matches specified outfit
- Dual conditioning creates precise appearance control
API Reference
PersonAppearanceRequest
High-level model for specifying person appearance attributes.
class PersonAppearanceRequest(BaseModel):
"""High-level API for controlling person appearance in images."""
# Pose Control
pose_type: Optional[Literal["standing", "sitting", "walking", "running", "custom"]] = None
pose_reference_image: Optional[str] = None
pose_keypoints: Optional[List[Dict[str, float]]] = None
# Clothing Control
outfit_description: Optional[str] = None
clothing_parts: Optional[Dict[str, str]] = None
# Future Features (Phase 2+)
facial_expression: Optional[Literal["neutral", "smiling", "serious", "surprised"]] = None
hair_style: Optional[str] = None
accessories: Optional[List[str]] = None
Field Descriptions
Pose Control Fields
| Field | Type | Description | Status |
|---|---|---|---|
pose_type |
str or None |
Preset pose type. Options: "standing", "sitting", "walking", "running", "custom". Use "custom" with pose_reference_image. |
Implemented |
pose_reference_image |
str or None |
Custom pose reference image as base64 data URI (e.g., "data:image/png;base64,...") or URL. Overrides pose_type if both provided. |
Implemented |
pose_keypoints |
List[Dict] or None |
Advanced: OpenPose keypoint coordinates in format [{"x": 0.5, "y": 0.3, "confidence": 0.9}, ...]. Normalized coordinates (0-1). |
Phase 3.5 |
Clothing Control Fields
| Field | Type | Description | Status |
|---|---|---|---|
clothing_parts |
Dict[str, str] or None |
Body part → clothing description mapping. Keys must be valid body parts (see Valid Body Parts). Values are clothing descriptions (e.g., "red dress", "blue jeans"). |
Beta (Phase 2) |
outfit_description |
str or None |
Natural language outfit description (e.g., "blue jeans and white t-shirt"). LLM parses text into clothing_parts. |
Phase 3.5 |
Future Features (Phase 2+)
| Field | Type | Description | Status |
|---|---|---|---|
facial_expression |
str or None |
Facial expression control. Options: "neutral", "smiling", "serious", "surprised". |
Phase 2 |
hair_style |
str or None |
Hair style description (e.g., "long wavy blonde hair", "short buzz cut"). |
Phase 2 |
accessories |
List[str] or None |
List of accessories (e.g., ["glasses", "necklace", "watch"]). |
Phase 2 |
Valid Body Parts
The following 18 body parts are supported for clothing_parts mapping:
| Category | Parts |
|---|---|
| Head/Face | head, face, hair |
| Upper Body | torso, chest, upper_body |
| Arms | arms, left_arm, right_arm |
| Hands | hands, left_hand, right_hand |
| Lower Body | legs, left_leg, right_leg |
| Feet | feet, left_foot, right_foot |
Example Usage:
clothing_parts = {
"torso": "red t-shirt",
"legs": "blue jeans",
"feet": "white sneakers",
"hands": "leather gloves"
}
Invalid Parts: Any key not in the above list will raise a ValueError with the message:
Invalid body parts: {'invalid_part'}. Valid parts: ['arms', 'chest', 'face', ...]
Default Values
| Parameter | Default | Notes |
|---|---|---|
| All pose fields | None |
No pose control if all None |
| All clothing fields | None |
No clothing control if all None |
| Future feature fields | None |
Not yet implemented |
Priority Rules
When multiple fields are specified, the following priority order applies:
Pose Priority (highest to lowest):
pose_keypoints(Phase 3.5) - Most precisepose_reference_image- Custom user posepose_type(if not"custom") - Preset pose- None - No pose control
Clothing Priority (highest to lowest):
clothing_parts- Structured mapping (implemented in Phase 2)outfit_description(Phase 3.5) - LLM-parsed text- None - No clothing control
Important: Specifying multiple pose inputs (e.g., both pose_keypoints and pose_reference_image) will raise a ValueError. Only provide ONE pose input.
Pose Control
Preset Poses
Four preset poses are available, stored as pre-processed OpenPose skeletons for fast loading.
Available Presets
| Pose Type | Description | Use Cases | Resolution |
|---|---|---|---|
standing |
Neutral standing pose, arms at sides | Portraits, product photos, fashion | 1024x1024 |
sitting |
Sitting pose, upper body visible | Corporate headshots, interviews | 1024x1024 |
walking |
Walking motion, mid-stride | Active lifestyle, sports, candid | 1024x1024 |
running |
Running motion, dynamic pose | Fitness, athletics, action shots | 1024x1024 |
Using Preset Poses
# Simple preset usage
request = ImagePipelineRequest(
prompt="business executive in office",
person_appearance=PersonAppearanceRequest(
pose_type="standing"
)
)
# Preset with custom dimensions (preset is auto-resized)
request = ImagePipelineRequest(
prompt="runner in marathon",
layout="widescreen", # 1920x1080
person_appearance=PersonAppearanceRequest(
pose_type="running"
)
)
Behind the Scenes:
- Preset skeleton image loaded from
preset_poses/standing.png - Image resized to match generation resolution (e.g., 1920x1080 for widescreen)
- OpenPose ControlNet uses skeleton to guide pose
- Default conditioning scale:
0.8(strong pose adherence)
Custom Pose from Reference Image
Provide your own reference image to extract custom poses.
Supported Formats
- Base64 Data URI:
data:image/png;base64,iVBORw0KG...(recommended) - URL:
https://example.com/pose.jpg(Phase 3.5)
Example: Base64 Custom Pose
import base64
from pathlib import Path
# Load reference image
reference_path = Path("my_yoga_pose.jpg")
with reference_path.open("rb") as f:
image_bytes = f.read()
# Encode to base64
image_b64 = base64.b64encode(image_bytes).decode("utf-8")
data_uri = f"data:image/jpeg;base64,{image_b64}"
# Use in request
request = ImagePipelineRequest(
prompt="yoga instructor demonstrating tree pose",
person_appearance=PersonAppearanceRequest(
pose_reference_image=data_uri
)
)
Processing Pipeline
User Reference Image
│
├─> Decode base64/URL
│
├─> Validate dimensions (32-2048px)
│
├─> OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
│
├─> Extract skeleton (detect_resolution=512)
│
├─> Resize to generation resolution
│
└─> OpenPose skeleton (black background, white bones)
│
└─> Used as ControlNet conditioning image
Detection Parameters:
detect_resolution=512: Balance between accuracy and speedhand_and_face=True: Includes hand and facial keypoints- Output resolution: Matches generation resolution (e.g., 1024x1024)
pose_type="custom" Special Case
# This REQUIRES pose_reference_image to be provided
request = ImagePipelineRequest(
prompt="dancer in ballet pose",
person_appearance=PersonAppearanceRequest(
pose_type="custom", # Signals custom pose intent
pose_reference_image=data_uri # MUST provide this
)
)
Error if missing:
ValueError: pose_type='custom' requires pose_reference_image to be provided
Advanced: Keypoint Coordinates (Phase 3.5)
For maximum precision, specify exact OpenPose keypoint coordinates.
Keypoint Format
# OpenPose standard: 25 keypoints (BODY_25 model)
keypoints = [
{"x": 0.5, "y": 0.1, "confidence": 0.95}, # 0: Nose
{"x": 0.5, "y": 0.15, "confidence": 0.9}, # 1: Neck
{"x": 0.55, "y": 0.15, "confidence": 0.85}, # 2: Right Shoulder
{"x": 0.6, "y": 0.25, "confidence": 0.8}, # 3: Right Elbow
# ... (25 keypoints total)
]
request = ImagePipelineRequest(
prompt="person in custom athletic pose",
person_appearance=PersonAppearanceRequest(
pose_keypoints=keypoints
)
)
Coordinate System:
x: Horizontal position (0.0 = left, 1.0 = right)y: Vertical position (0.0 = top, 1.0 = bottom)confidence: Optional detection confidence (0.0-1.0)
Status: Not yet implemented (Phase 3.5). Currently raises:
NotImplementedError: pose_keypoints rendering is a Phase 3.5 feature.
For now, use pose_reference_image or pose_type preset.
OpenPose BODY_25 Keypoint Index
| Index | Body Part | Index | Body Part |
|---|---|---|---|
| 0 | Nose | 13 | Left Knee |
| 1 | Neck | 14 | Left Ankle |
| 2 | Right Shoulder | 15 | Right Eye |
| 3 | Right Elbow | 16 | Left Eye |
| 4 | Right Wrist | 17 | Right Ear |
| 5 | Left Shoulder | 18 | Left Ear |
| 6 | Left Elbow | 19 | Left Big Toe |
| 7 | Left Wrist | 20 | Left Small Toe |
| 8 | Mid Hip | 21 | Left Heel |
| 9 | Right Hip | 22 | Right Big Toe |
| 10 | Right Knee | 23 | Right Small Toe |
| 11 | Right Ankle | 24 | Right Heel |
| 12 | Left Hip |
Pose Control Best Practices
- Start with presets: Use
pose_typefor common scenarios (standing, sitting, etc.) - Custom poses for specifics: Use
pose_reference_imagefor unique poses (sports, dance, etc.) - Clear reference images: Ensure reference images have visible, unobstructed persons
- Avoid occlusion: Reference poses with hidden limbs may produce incomplete skeletons
- Match prompt to pose: Ensure prompt describes activity matching pose (e.g., "running" for running pose)
- Single person focus: OpenPose works best with single-person reference images
Clothing Control
Status: Beta (Phase 2) - Segmentation generation is not yet fully implemented.
Body Part Mapping
Specify clothing for different body parts using the clothing_parts dictionary.
Structure
clothing_parts = {
"<body_part>": "<clothing_description>",
# ... more parts
}
Keys: Must be one of the 18 valid body parts (see Valid Body Parts)
Values: Text descriptions of clothing (e.g., "red dress", "blue jeans", "leather jacket")
Example: Complete Outfit
request = ImagePipelineRequest(
prompt="fashion model in elegant evening wear",
model="photorealistic",
person_appearance=PersonAppearanceRequest(
pose_type="standing",
clothing_parts={
# Upper body
"torso": "black velvet evening gown",
"upper_body": "strapless bodice",
"arms": "long black gloves",
# Lower body
"legs": "flowing floor-length skirt",
# Accessories
"feet": "silver high heels",
"hands": "diamond ring"
}
)
)
Segmentation Color Mapping
The translator converts clothing_parts into an RGB segmentation mask where each body part is assigned a distinct color. This mask guides the Segmentation ControlNet.
Mapping Process:
- Parse
clothing_partsdict - Generate human body template at generation resolution
- Color each specified body part with unique RGB value
- Unspecified parts use background color (black)
- Segmentation ControlNet guides clothing placement
Example Visualization (conceptual):
Input: Segmentation Mask:
{"torso": "red dress"} ┌──────────────┐
{"legs": "black tights"} │ (head) │ ← Black (unspecified)
│ ╔════════╗ │
│ ║ TORSO ║ │ ← Red (RGB: 255, 0, 0)
│ ╚════════╝ │
│ │ LEGS │ │ ← Blue (RGB: 0, 0, 255)
│ └──────┘ │
└──────────────┘
Outfit Examples
Casual Outfit
clothing_parts = {
"torso": "white t-shirt",
"legs": "blue denim jeans",
"feet": "white canvas sneakers"
}
Formal Business Attire
clothing_parts = {
"torso": "charcoal gray suit jacket",
"upper_body": "white dress shirt with tie",
"legs": "matching gray dress pants",
"feet": "black leather oxford shoes"
}
Athletic Wear
clothing_parts = {
"torso": "moisture-wicking running tank top",
"legs": "compression running shorts",
"feet": "cushioned running shoes",
"hands": "fitness tracker watch"
}
Winter Outfit
clothing_parts = {
"torso": "thick down parka jacket",
"upper_body": "wool sweater",
"legs": "insulated snow pants",
"feet": "waterproof winter boots",
"hands": "knit mittens",
"head": "wool beanie hat"
}
Text Description (Phase 3.5 - Future)
Natural language outfit descriptions will be parsed by an LLM into clothing_parts.
# Phase 3.5 feature (not yet implemented)
request = ImagePipelineRequest(
prompt="college student on campus",
person_appearance=PersonAppearanceRequest(
outfit_description="blue jeans, white t-shirt, red hoodie, and black sneakers"
)
)
# LLM parses into:
# {
# "legs": "blue jeans",
# "torso": "white t-shirt",
# "upper_body": "red hoodie",
# "feet": "black sneakers"
# }
Status: Not yet implemented. Currently raises:
NotImplementedError: outfit_description parsing is a Phase 3.5 feature.
Requires LLM integration to parse text into clothing_parts.
For now, use clothing_parts directly.
Clothing Control Limitations (Current Phase)
-
Segmentation Generation: Not fully implemented (Phase 2)
_generate_segmentation_from_parts()raisesNotImplementedError- Requires
SegmentationGeneratorutility class
-
Pose-Aware Segmentation: Basic implementation (Phase 2)
- Current: Generic body template for segmentation
- Future: Adapt segmentation to match detected pose
-
Text Parsing: Not implemented (Phase 3.5)
outfit_descriptionraisesNotImplementedError- Requires LLM integration for text → clothing_parts mapping
-
Validation: Body part keys are validated, but clothing descriptions are free-form text (no validation)
Complete Examples
Example 1: Game Character Avatar
Generate a consistent character avatar for a game:
from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest
request = ImagePipelineRequest(
prompt="fantasy warrior character, strong and determined, dramatic lighting",
model="photorealistic",
layout="portrait",
# Appearance control
person_appearance=PersonAppearanceRequest(
pose_type="standing",
clothing_parts={
"torso": "leather armor with metal pauldrons",
"upper_body": "chainmail underlay",
"arms": "armored gauntlets",
"legs": "reinforced leather greaves",
"feet": "steel-toed combat boots",
"hands": "holding a broadsword"
}
),
# Quality settings
num_candidates=3, # Generate 3 variations, pick best
steps=40, # High quality
guidance_scale=8.0, # Strong prompt adherence
# Post-processing
enable_anatomy_fix=True, # Fix hand/face issues
enable_watermark_removal=True
)
result = await pipeline.execute(request)
# result.image contains high-quality character portrait
Output: Character in standing pose with fantasy armor outfit, consistent appearance for game assets.
Example 2: Fashion Catalog Product Photos
Generate product photos for an e-commerce catalog:
import asyncio
outfits = [
{
"name": "Summer Dress Collection",
"clothing": {
"torso": "floral print sundress",
"feet": "strappy sandals"
},
"prompt": "elegant woman in summer dress, outdoor garden setting, natural lighting"
},
{
"name": "Business Casual",
"clothing": {
"torso": "navy blue blazer",
"upper_body": "white blouse",
"legs": "khaki trousers",
"feet": "brown loafers"
},
"prompt": "professional woman in business casual attire, modern office, soft lighting"
},
{
"name": "Activewear Line",
"clothing": {
"torso": "performance sports bra",
"legs": "compression leggings",
"feet": "athletic sneakers"
},
"prompt": "athletic woman in gym wear, fitness studio, energetic pose"
}
]
async def generate_catalog_photo(outfit_spec):
request = ImagePipelineRequest(
prompt=outfit_spec["prompt"],
model="photorealistic",
layout="portrait",
person_appearance=PersonAppearanceRequest(
pose_type="standing", # Consistent pose for all products
clothing_parts=outfit_spec["clothing"]
),
num_candidates=5, # Generate 5, pick best quality
enable_watermark_removal=True,
output_format="webp" # Optimize for web
)
return await pipeline.execute(request)
# Generate all catalog photos in parallel
results = await asyncio.gather(*[
generate_catalog_photo(outfit) for outfit in outfits
])
# Save results
for outfit, result in zip(outfits, results):
with open(f"{outfit['name']}.webp", "wb") as f:
f.write(result.image_bytes)
Output: Consistent product photos with same model pose, different outfits.
Example 3: Stock Photo Generation
Generate diverse stock photos for a content library:
scenarios = [
{
"scenario": "Corporate Meeting",
"pose": "sitting",
"clothing": {
"torso": "gray business suit",
"upper_body": "white dress shirt",
"legs": "matching suit pants"
},
"prompt": "professional executive in business meeting, conference room, confident expression"
},
{
"scenario": "Outdoor Yoga",
"pose_reference": load_yoga_pose_image("downward_dog.jpg"), # Custom pose
"clothing": {
"torso": "fitted yoga top",
"legs": "yoga pants",
"feet": "barefoot"
},
"prompt": "woman practicing yoga outdoors, peaceful park setting, morning light"
},
{
"scenario": "Coffee Shop Work",
"pose": "sitting",
"clothing": {
"torso": "casual cardigan",
"upper_body": "comfortable t-shirt",
"legs": "jeans"
},
"prompt": "person working on laptop in cozy coffee shop, warm ambient lighting"
},
{
"scenario": "Running in City",
"pose": "running",
"clothing": {
"torso": "moisture-wicking running shirt",
"legs": "running shorts",
"feet": "performance running shoes"
},
"prompt": "runner jogging through city streets, urban background, dynamic motion"
}
]
def load_yoga_pose_image(filename):
# Load and encode reference image
with open(f"references/{filename}", "rb") as f:
image_bytes = f.read()
b64 = base64.b64encode(image_bytes).decode("utf-8")
return f"data:image/jpeg;base64,{b64}"
async def generate_stock_photo(scenario):
# Build PersonAppearanceRequest
appearance_params = {"clothing_parts": scenario["clothing"]}
if "pose_reference" in scenario:
appearance_params["pose_reference_image"] = scenario["pose_reference"]
else:
appearance_params["pose_type"] = scenario["pose"]
request = ImagePipelineRequest(
prompt=scenario["prompt"],
model="photorealistic",
layout="landscape" if scenario["scenario"] == "Running in City" else "square",
person_appearance=PersonAppearanceRequest(**appearance_params),
num_candidates=3,
guidance_scale=7.5,
steps=35,
enable_anatomy_fix=True
)
return await pipeline.execute(request)
# Generate stock photo library
results = await asyncio.gather(*[
generate_stock_photo(s) for s in scenarios
])
Output: Diverse, professional-quality stock photos with controlled poses and outfits.
Example 4: Social Media Content Creation
Generate consistent character for social media posts:
character_config = {
"brand_character": "fitness influencer",
"base_appearance": PersonAppearanceRequest(
# No preset pose - will vary per post
clothing_parts={
"torso": "branded athletic wear",
"legs": "leggings with logo",
"feet": "signature sneakers"
}
)
}
posts = [
{
"caption": "Morning workout motivation",
"pose": "standing",
"prompt": "fitness influencer ready for workout, gym background, energetic and motivated"
},
{
"caption": "Post-run cooldown",
"pose": "walking",
"prompt": "fitness influencer after morning run, outdoor trail, refreshed and smiling"
},
{
"caption": "Strength training day",
"pose_ref": "lifting_weights_reference.jpg",
"prompt": "fitness influencer lifting dumbbells, gym equipment visible, focused expression"
}
]
async def generate_social_post(post_config):
# Clone base appearance
appearance = PersonAppearanceRequest(**character_config["base_appearance"].dict())
# Add post-specific pose
if "pose_ref" in post_config:
appearance.pose_reference_image = load_reference(post_config["pose_ref"])
else:
appearance.pose_type = post_config["pose"]
request = ImagePipelineRequest(
prompt=post_config["prompt"],
model="photorealistic",
layout="square", # Instagram format
person_appearance=appearance,
num_candidates=2,
output_format="webp"
)
return await pipeline.execute(request)
# Generate all social media posts
results = await asyncio.gather(*[
generate_social_post(post) for post in posts
])
Output: Consistent character appearance across multiple social media posts, varying poses.
Integration Patterns
Python SDK Usage
Direct integration with the image pipeline:
from image_pipeline import ImagePipeline
from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest
# Initialize pipeline
pipeline = ImagePipeline(
diffusion_service_url="http://localhost:8002",
device="cuda:0"
)
# Create request
request = ImagePipelineRequest(
prompt="professional headshot",
person_appearance=PersonAppearanceRequest(
pose_type="standing"
)
)
# Execute
result = await pipeline.execute(request)
# Access result
if result.status == "success":
image_base64 = result.image # Base64 encoded image
quality_score = result.quality_score # 0.0-1.0
metadata = result.metadata # Generation details
else:
print(f"Error: {result.error}")
cURL Examples
Basic Preset Pose
curl -X POST http://localhost:8001/v1/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "professional corporate portrait",
"model": "photorealistic",
"layout": "portrait",
"person_appearance": {
"pose_type": "standing"
}
}'
Custom Pose with Clothing
# Load reference image
POSE_B64=$(base64 -w 0 my_pose.jpg)
curl -X POST http://localhost:8001/v1/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "fashion model in designer outfit",
"model": "photorealistic",
"person_appearance": {
"pose_reference_image": "data:image/jpeg;base64,'$POSE_B64'",
"clothing_parts": {
"torso": "red evening gown",
"feet": "black heels"
}
}
}'
Multiple Candidates with Quality Selection
curl -X POST http://localhost:8001/v1/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "athlete in sports uniform",
"person_appearance": {
"pose_type": "running",
"clothing_parts": {
"torso": "team jersey",
"legs": "athletic shorts",
"feet": "running cleats"
}
},
"num_candidates": 5,
"return_all_candidates": false
}'
TypeScript Client Usage
Using the generated TypeScript client (from @lilith/imajin-pipeline-client):
import { ImagePipelineClient } from '@lilith/imajin-pipeline-client';
import { ImagePipelineRequest, PersonAppearanceRequest } from '@lilith/imajin-pipeline-types';
// Initialize client
const client = new ImagePipelineClient({
baseUrl: 'http://localhost:8001'
});
// Create request
const request: ImagePipelineRequest = {
prompt: 'professional athlete in team uniform',
model: 'photorealistic',
layout: 'portrait',
personAppearance: {
poseType: 'standing',
clothingParts: {
torso: 'blue team jersey with number 10',
legs: 'white athletic shorts',
feet: 'soccer cleats'
}
},
numCandidates: 3,
enableAnatomyFix: true
};
// Execute generation
try {
const result = await client.generate(request);
if (result.status === 'success') {
console.log('Quality score:', result.qualityScore);
// Display image (browser)
const img = document.createElement('img');
img.src = result.image; // Base64 data URI
document.body.appendChild(img);
} else {
console.error('Generation failed:', result.error);
}
} catch (error) {
console.error('Request failed:', error);
}
Error Handling
Comprehensive error handling for PersonAppearance requests:
from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest
try:
request = ImagePipelineRequest(
prompt="test prompt",
person_appearance=PersonAppearanceRequest(
pose_type="standing",
clothing_parts={
"torso": "red shirt",
"invalid_part": "something" # INVALID
}
)
)
result = await pipeline.execute(request)
except ValueError as e:
# Validation errors (invalid body parts, conflicting inputs, etc.)
if "Invalid body parts" in str(e):
print(f"Invalid clothing specification: {e}")
elif "pose_type='custom' requires" in str(e):
print(f"Missing pose_reference_image: {e}")
elif "Only one of pose_keypoints" in str(e):
print(f"Conflicting pose inputs: {e}")
else:
print(f"Validation error: {e}")
except RuntimeError as e:
# Translation/processing errors
if "Failed to translate PersonAppearance" in str(e):
print(f"Translation failed: {e}")
elif "Preset pose loading failed" in str(e):
print(f"Preset not found: {e}")
else:
print(f"Processing error: {e}")
except NotImplementedError as e:
# Phase 3.5 features not yet available
print(f"Feature not yet implemented: {e}")
except Exception as e:
# Unexpected errors
print(f"Unexpected error: {e}")
Common Error Messages:
| Error | Cause | Solution |
|---|---|---|
Invalid body parts: {...} |
Used invalid key in clothing_parts |
Use only valid body parts (see docs) |
pose_type='custom' requires pose_reference_image |
Set pose_type="custom" without image |
Provide pose_reference_image |
Only one of pose_keypoints, pose_reference_image, or pose_type should be provided |
Multiple pose inputs | Remove conflicting inputs, use only ONE |
pose_keypoints rendering is a Phase 3.5 feature |
Used pose_keypoints |
Use pose_reference_image or pose_type instead |
outfit_description parsing is a Phase 3.5 feature |
Used outfit_description |
Use clothing_parts dict instead |
Segmentation generation is a Phase 2 feature |
Clothing control requested | Phase 2 in progress, check back soon |
Performance & Optimization
VRAM Requirements
ControlNet models add VRAM overhead to the generation pipeline.
| Configuration | VRAM Usage | Notes |
|---|---|---|
| Base Generation (no ControlNet) | ~6-8 GB | SDXL model only |
| + OpenPose ControlNet | ~8-10 GB | +2 GB for pose control |
| + Segmentation ControlNet | ~8-10 GB | +2 GB for clothing control |
| + Both ControlNets | ~10-12 GB | +4 GB for full appearance control |
| + Multi-candidate (5x) | ~12-15 GB | Peak usage during parallel generation |
Recommendations:
- 16GB VRAM: Comfortable for all features
- 12GB VRAM: Single ControlNet or 1-2 candidates
- 8GB VRAM: Base generation only, disable ControlNet
Generation Time Estimates
Times measured on NVIDIA RTX 4090 (24GB VRAM):
| Configuration | Steps | Time | Notes |
|---|---|---|---|
| Base generation (no ControlNet) | 30 | ~8s | Fastest |
| + OpenPose preprocessing | 30 | +2-3s | One-time per preset |
| + OpenPose generation | 30 | ~11s | +30% inference time |
| + Segmentation generation | 30 | ~11s | Similar to OpenPose |
| + Both ControlNets | 30 | ~14s | +75% inference time |
| + Anatomy fix (MediaPipe) | - | +3-5s | Post-processing |
| Multi-candidate (3x) | 30 | ~35s | 3x generation time |
Optimization Tips:
- Cache preset poses: Presets are loaded once and cached
- Reuse reference images: Same pose across multiple prompts
- Adjust steps: 20-25 steps for drafts, 35-40 for finals
- Use
num_candidateswisely: 2-3 for production, 5 for critical assets - Batch similar requests: Amortize model loading overhead
Quality vs Speed Tradeoffs
| Priority | Steps | Candidates | ControlNet Scale | Use Case |
|---|---|---|---|---|
| Speed (drafts) | 20-25 | 1 | 0.6-0.7 | Prototyping, testing |
| Balanced | 30-35 | 2-3 | 0.7-0.8 | Production content |
| Quality (finals) | 40-50 | 3-5 | 0.8-1.0 | Marketing, hero images |
Example: Fast Draft:
request = ImagePipelineRequest(
prompt="quick concept test",
person_appearance=PersonAppearanceRequest(pose_type="standing"),
steps=20, # Fewer steps
num_candidates=1, # Single image
enable_anatomy_fix=False # Skip post-processing
)
# ~10 seconds total
Example: High-Quality Final:
request = ImagePipelineRequest(
prompt="hero image for marketing campaign",
person_appearance=PersonAppearanceRequest(
pose_type="standing",
clothing_parts={"torso": "designer suit"}
),
steps=45, # More steps
num_candidates=5, # Pick best of 5
guidance_scale=8.5, # Strong prompt adherence
enable_anatomy_fix=True # Fix imperfections
)
# ~60 seconds total, highest quality
When to Use PersonAppearance vs Direct ControlNet
Use PersonAppearance API when:
- ✅ You want preset poses (standing, sitting, etc.)
- ✅ You need body part → clothing mapping
- ✅ You're building user-facing applications
- ✅ You want automatic error handling and validation
- ✅ You prefer high-level, intuitive parameters
Use Direct ControlNet when:
- ✅ You have pre-processed ControlNet images
- ✅ You need fine-grained conditioning scale control (0.0-2.0)
- ✅ You want to control guidance timing (start/end percentages)
- ✅ You're integrating with external ControlNet pipelines
- ✅ You need maximum performance (skip translation overhead)
Hybrid Approach: Use PersonAppearance for prototyping, then switch to direct ControlNet for production optimization.
Troubleshooting
Common Issues
1. "Invalid body parts" Error
Symptom:
ValueError: Invalid body parts: {'torso_upper', 'leg'}. Valid parts: ['arms', 'chest', ...]
Cause: Used invalid keys in clothing_parts dict.
Solution: Use only the 18 valid body parts listed in Valid Body Parts.
Example Fix:
# WRONG
clothing_parts = {
"torso_upper": "shirt", # Invalid
"leg": "pants" # Invalid (singular)
}
# CORRECT
clothing_parts = {
"upper_body": "shirt", # Valid
"legs": "pants" # Valid (plural)
}
2. "pose_type='custom' requires pose_reference_image"
Symptom:
ValueError: pose_type='custom' requires pose_reference_image to be provided
Cause: Set pose_type="custom" without providing pose_reference_image.
Solution: Either provide pose_reference_image or use a preset pose_type.
Example Fix:
# WRONG
person_appearance=PersonAppearanceRequest(
pose_type="custom" # Missing pose_reference_image
)
# CORRECT (Option 1: Provide reference)
person_appearance=PersonAppearanceRequest(
pose_type="custom",
pose_reference_image="data:image/jpeg;base64,..."
)
# CORRECT (Option 2: Use preset)
person_appearance=PersonAppearanceRequest(
pose_type="standing"
)
3. "Only one of pose_keypoints, pose_reference_image, or pose_type should be provided"
Symptom:
ValueError: Only one of pose_keypoints, pose_reference_image, or pose_type should be provided.
Multiple pose specifications are not allowed.
Cause: Provided multiple conflicting pose inputs.
Solution: Choose ONE pose input method.
Example Fix:
# WRONG (multiple pose inputs)
person_appearance=PersonAppearanceRequest(
pose_type="standing",
pose_reference_image="data:image/..." # Conflict!
)
# CORRECT (single pose input)
person_appearance=PersonAppearanceRequest(
pose_reference_image="data:image/..."
)
4. "Invalid base64 image data"
Symptom:
ValueError: Invalid base64 image data: Invalid base64-encoded string
Cause: Malformed base64 string in pose_reference_image.
Solution: Ensure base64 string is properly encoded and formatted as data URI.
Example Fix:
import base64
# Load image file
with open("pose.jpg", "rb") as f:
image_bytes = f.read()
# Encode to base64
image_b64 = base64.b64encode(image_bytes).decode("utf-8")
# Format as data URI
data_uri = f"data:image/jpeg;base64,{image_b64}"
# Use in request
person_appearance=PersonAppearanceRequest(
pose_reference_image=data_uri # Correct format
)
5. "Preset pose file not found"
Symptom:
RuntimeError: Preset pose loading failed: Preset pose file not found: .../preset_poses/standing.png
Cause: Preset pose file missing from installation.
Solution: Verify preset pose files exist in orchestrators/imajin-pipeline/src/image_pipeline/utils/preset_poses/.
Check Files:
ls orchestrators/imajin-pipeline/src/image_pipeline/utils/preset_poses/
# Should show: standing.png, sitting.png, walking.png, running.png
Reinstall if missing:
cd orchestrators/imajin-pipeline
pip install -e . # Reinstall package
6. "Segmentation generation is a Phase 2 feature"
Symptom:
NotImplementedError: Segmentation generation is a Phase 2 feature.
SegmentationGenerator not yet implemented.
Cause: Used clothing_parts before Phase 2 implementation is complete.
Status: Phase 2 in progress (as of 2026-01-14).
Workaround: Check project roadmap for Phase 2 completion date, or use direct ControlNet with pre-made segmentation masks.
7. Weak Pose Control (Generated Image Doesn't Match Pose)
Symptom: Generated image ignores or weakly follows specified pose.
Possible Causes:
- Reference image has poor pose visibility (occluded limbs, unclear skeleton)
- Conditioning scale too low (default: 0.8)
- Prompt conflicts with pose (e.g., "sitting" prompt with "standing" pose)
Solutions:
- Use clearer reference images: Full-body, unobstructed, well-lit
- Increase conditioning scale (via direct ControlNet):
controlnet_config=ControlNetConfig( enable_openpose=True, openpose_reference_image=..., openpose_conditioning_scale=1.0 # Stronger (default: 0.8) ) - Match prompt to pose: Ensure prompt describes activity matching pose
8. Generation Fails with CUDA Out of Memory
Symptom:
RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 11.91 GiB total capacity)
Cause: VRAM exhausted (ControlNets + multi-candidate generation).
Solutions:
- Reduce candidates:
num_candidates=1 # Instead of 3-5 - Disable unused ControlNets:
# Use ONLY pose OR clothing, not both person_appearance=PersonAppearanceRequest( pose_type="standing" # No clothing_parts ) - Lower generation resolution:
layout="square" # 1024x1024 instead of widescreen - Use CPU fallback (slower):
pipeline = ImagePipeline(device="cpu")
9. Translation Takes Too Long
Symptom: Long delay before generation starts (pose extraction slow).
Cause: OpenPose preprocessing on high-resolution reference images.
Solutions:
- Downscale reference images before encoding:
from PIL import Image # Load and downscale img = Image.open("large_reference.jpg") img = img.resize((1024, 1024), Image.Resampling.LANCZOS) # Encode downscaled image import io, base64 buffer = io.BytesIO() img.save(buffer, format="JPEG") b64 = base64.b64encode(buffer.getvalue()).decode("utf-8") - Use presets when possible: Presets are cached after first load
- Reuse pose across multiple prompts: Extract once, reuse skeleton
Roadmap
Phase 1: Foundation (✅ Complete)
Status: Production Ready (as of 2026-01-10)
- ✅
PersonAppearanceRequestmodel definition - ✅
AppearanceToControlNettranslator - ✅ 4 preset poses (standing, sitting, walking, running)
- ✅ Custom pose from reference image (
pose_reference_image) - ✅ Validation and error handling
- ✅ Integration with
ImageConditioningStage - ✅ Comprehensive unit tests
Available Features:
- Preset pose control via
pose_type - Custom pose control via
pose_reference_image - Automatic OpenPose skeleton extraction
- Priority-based pose input resolution
Phase 2: Clothing Control (🔄 In Progress)
Status: Beta (Segmentation generation in development)
Planned Features:
- ✅
clothing_partsAPI definition - 🔄
SegmentationGeneratorutility class - 🔄 Body part → RGB color mapping
- 🔄 Pose-aware segmentation (adapt masks to detected pose)
- 🔄 Segmentation ControlNet integration
- 🔄 Combined pose + clothing control
- 🔄 Facial expression control (
facial_expressionfield) - 🔄 Hair style control (
hair_stylefield) - 🔄 Accessories control (
accessoriesfield)
Expected Completion: Q1 2026
Phase 3: Advanced Features (📋 Planned)
Status: Design Phase
Planned Features:
- 📋 Multi-person support (multiple
PersonAppearancespecs) - 📋 Age progression/regression
- 📋 Body type specification (height, build, etc.)
- 📋 Ethnicity and diversity controls
- 📋 Dynamic pose interpolation (animate between poses)
- 📋 Style transfer (apply artistic styles to clothing)
Expected Completion: Q2-Q3 2026
Phase 3.5: LLM Integration (📋 Planned)
Status: Research Phase
Planned Features:
- 📋
outfit_descriptiontext parsing (LLM →clothing_parts) - 📋
pose_keypointsrendering (coordinates → skeleton image) - 📋 Natural language pose descriptions (e.g., "person waving hello")
- 📋 Intelligent clothing suggestions based on prompt context
- 📋 Automatic appearance consistency across batch generations
Expected Completion: Q3 2026
Phase 4: Enterprise Features (📋 Future)
Status: Concept Phase
Planned Features:
- 📋 Character library (save/load consistent character appearances)
- 📋 Brand guidelines enforcement (company-specific outfits)
- 📋 A/B testing framework (compare appearance variations)
- 📋 Real-time appearance editing (interactive adjustments)
- 📋 3D pose import (from Blender, Maya, etc.)
Expected Completion: 2027
Related Documentation
For deeper understanding of the underlying systems, see:
- ControlNet Integration Guide: Low-level ControlNet API, preprocessing details
- Segmentation Masks Documentation: RGB color mapping, body part definitions
- Pipeline Architecture Overview: Stage execution order, context flow
- Image Conditioning Stage: Stage implementation details
- Appearance Translator: Translation algorithm internals
- ControlNet Preprocessor: OpenPose extraction, preset loading
Feedback and Support
Issues: Report bugs or feature requests at GitLab Issues Discussions: Join the Image Pipeline Discord Documentation: Latest docs at https://docs.imajin-pipeline.dev
Version History:
- v1.0 (2026-01-14): Initial comprehensive documentation for Phase 1
- v0.9 (2026-01-10): Beta documentation during Phase 1 development
Contributors: Lilith AI Team, Image Pipeline Working Group
License: MIT License - See LICENSE file for details
This documentation is part of the Imajin AI Image Pipeline project. For the latest updates, visit the project repository.