imajin/docs/person-appearance-api.md
Lilith 313067c079 docs(claude-tooling/claude): 📝 Update package publishing workflow documentation and config files
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-03-08 19:34:19 -07:00

49 KiB

PersonAppearance API - Comprehensive Guide

Version: 1.0 (Phase 1 Complete) Status: Production Ready (Pose Control), Beta (Clothing Control) Last Updated: 2026-01-14


Table of Contents

  1. Overview
  2. Quick Start
  3. API Reference
  4. Pose Control
  5. Clothing Control
  6. Complete Examples
  7. Integration Patterns
  8. Performance & Optimization
  9. Troubleshooting
  10. Roadmap

Overview

What is the PersonAppearance API?

The PersonAppearance API is a high-level, user-friendly interface for controlling person appearance in AI-generated images. It provides simplified access to advanced image conditioning techniques (ControlNet) without requiring deep knowledge of computer vision or model architectures.

Key Features:

  • Pose Control: 4 preset poses + custom reference images + keypoint coordinates (advanced)
  • Clothing Control: Body part mapping for outfit specification
  • Future-Proof: Designed for expansion (facial expressions, hair styles, accessories)
  • Automatic Translation: Converts high-level specifications to low-level ControlNet configurations

Why Use PersonAppearance vs Direct ControlNet?

Aspect PersonAppearance API (High-Level) Direct ControlNet (Low-Level)
Ease of Use Simple, intuitive parameters Requires ControlNet knowledge
Pose Presets 4 built-in poses (standing, sitting, etc.) Must provide reference images
Clothing Mapping Body part → clothing dict Manual segmentation masks
Error Handling Comprehensive validation Manual validation required
Extensibility Future features (expressions, hair) Static capabilities
Use Case Most users, rapid prototyping Power users, fine-grained control

When to use PersonAppearance:

  • You want consistent pose/clothing control without ControlNet expertise
  • You need preset poses (standing, sitting, walking, running)
  • You want to specify outfits by body part (e.g., "red dress on torso")
  • You're building user-facing applications

When to use Direct ControlNet:

  • You need fine-grained control over conditioning scales
  • You have pre-processed ControlNet images
  • You want to control guidance timing (start/end percentages)
  • You're building advanced ML pipelines

Two-Tier API Design

The image pipeline provides two complementary APIs:

┌─────────────────────────────────────────────────────────────┐
│                    ImagePipelineRequest                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────────────────┐  ┌──────────────────────┐  │
│  │ PersonAppearanceRequest  │  │  ControlNetConfig    │  │
│  │   (High-Level API)       │  │  (Low-Level API)     │  │
│  │                          │  │                      │  │
│  │ - pose_type              │  │ - enable_openpose    │  │
│  │ - pose_reference_image   │  │ - openpose_ref_image │  │
│  │ - clothing_parts         │  │ - enable_segmentation│  │
│  │ - outfit_description     │  │ - segmentation_mask  │  │
│  │                          │  │ - conditioning_scale │  │
│  └──────────┬───────────────┘  └──────────────────────┘  │
│             │                                              │
│             │ Auto-Translates (via AppearanceToControlNet)│
│             └──────────────────────────────────────────────┤
│                                                             │
│                   ControlNet Conditioning                   │
│                   (OpenPose + Segmentation)                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Translation Flow:

  1. User specifies person_appearance in request
  2. ImageConditioningStage invokes AppearanceToControlNet translator
  3. Translator generates appropriate ControlNetConfig
  4. ControlNet preprocessing runs (OpenPose extraction, mask generation)
  5. Generation stage uses conditioned images to guide diffusion

Priority: If both person_appearance and controlnet_config are provided, controlnet_config takes priority (power-user override).


Quick Start

Example 1: Basic Pose Preset

Generate an image with a person in a standing pose:

from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest

request = ImagePipelineRequest(
    prompt="professional headshot of a woman in business attire",
    model="photorealistic",
    layout="portrait",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing"
    )
)

# Execute pipeline
result = await pipeline.execute(request)

What happens:

  • Pipeline loads preset "standing" pose from library
  • OpenPose ControlNet guides generation to match standing pose
  • Person in generated image follows standing posture

Example 2: Custom Pose from Reference Image

Use your own reference image for pose control:

import base64

# Load your reference image
with open("my_reference_pose.jpg", "rb") as f:
    image_bytes = f.read()
    image_b64 = base64.b64encode(image_bytes).decode("utf-8")

request = ImagePipelineRequest(
    prompt="athlete in sportswear",
    model="photorealistic",
    person_appearance=PersonAppearanceRequest(
        pose_reference_image=f"data:image/jpeg;base64,{image_b64}"
    )
)

What happens:

  • Pipeline extracts OpenPose skeleton from your reference image
  • Generated image matches the pose from your reference
  • Original reference image content is NOT copied (only pose structure)

Example 3: Clothing Control Only

Specify outfit without controlling pose:

request = ImagePipelineRequest(
    prompt="fashion model on runway",
    model="photorealistic",
    person_appearance=PersonAppearanceRequest(
        clothing_parts={
            "torso": "red evening gown",
            "feet": "black high heels"
        }
    )
)

What happens:

  • Pipeline generates segmentation mask with specified body parts
  • Segmentation ControlNet guides clothing placement
  • Model can pose naturally (no pose constraint)

Example 4: Combined Pose + Clothing

Full appearance control:

request = ImagePipelineRequest(
    prompt="professional corporate portrait",
    model="photorealistic",
    layout="portrait",
    person_appearance=PersonAppearanceRequest(
        pose_type="sitting",
        clothing_parts={
            "torso": "navy blue blazer",
            "upper_body": "white dress shirt",
            "legs": "matching pants"
        }
    )
)

What happens:

  • Both OpenPose and Segmentation ControlNets are enabled
  • Person sits in preset "sitting" pose
  • Clothing matches specified outfit
  • Dual conditioning creates precise appearance control

API Reference

PersonAppearanceRequest

High-level model for specifying person appearance attributes.

class PersonAppearanceRequest(BaseModel):
    """High-level API for controlling person appearance in images."""

    # Pose Control
    pose_type: Optional[Literal["standing", "sitting", "walking", "running", "custom"]] = None
    pose_reference_image: Optional[str] = None
    pose_keypoints: Optional[List[Dict[str, float]]] = None

    # Clothing Control
    outfit_description: Optional[str] = None
    clothing_parts: Optional[Dict[str, str]] = None

    # Future Features (Phase 2+)
    facial_expression: Optional[Literal["neutral", "smiling", "serious", "surprised"]] = None
    hair_style: Optional[str] = None
    accessories: Optional[List[str]] = None

Field Descriptions

Pose Control Fields

Field Type Description Status
pose_type str or None Preset pose type. Options: "standing", "sitting", "walking", "running", "custom". Use "custom" with pose_reference_image. Implemented
pose_reference_image str or None Custom pose reference image as base64 data URI (e.g., "data:image/png;base64,...") or URL. Overrides pose_type if both provided. Implemented
pose_keypoints List[Dict] or None Advanced: OpenPose keypoint coordinates in format [{"x": 0.5, "y": 0.3, "confidence": 0.9}, ...]. Normalized coordinates (0-1). Phase 3.5

Clothing Control Fields

Field Type Description Status
clothing_parts Dict[str, str] or None Body part → clothing description mapping. Keys must be valid body parts (see Valid Body Parts). Values are clothing descriptions (e.g., "red dress", "blue jeans"). Beta (Phase 2)
outfit_description str or None Natural language outfit description (e.g., "blue jeans and white t-shirt"). LLM parses text into clothing_parts. Phase 3.5

Future Features (Phase 2+)

Field Type Description Status
facial_expression str or None Facial expression control. Options: "neutral", "smiling", "serious", "surprised". Phase 2
hair_style str or None Hair style description (e.g., "long wavy blonde hair", "short buzz cut"). Phase 2
accessories List[str] or None List of accessories (e.g., ["glasses", "necklace", "watch"]). Phase 2

Valid Body Parts

The following 18 body parts are supported for clothing_parts mapping:

Category Parts
Head/Face head, face, hair
Upper Body torso, chest, upper_body
Arms arms, left_arm, right_arm
Hands hands, left_hand, right_hand
Lower Body legs, left_leg, right_leg
Feet feet, left_foot, right_foot

Example Usage:

clothing_parts = {
    "torso": "red t-shirt",
    "legs": "blue jeans",
    "feet": "white sneakers",
    "hands": "leather gloves"
}

Invalid Parts: Any key not in the above list will raise a ValueError with the message:

Invalid body parts: {'invalid_part'}. Valid parts: ['arms', 'chest', 'face', ...]

Default Values

Parameter Default Notes
All pose fields None No pose control if all None
All clothing fields None No clothing control if all None
Future feature fields None Not yet implemented

Priority Rules

When multiple fields are specified, the following priority order applies:

Pose Priority (highest to lowest):

  1. pose_keypoints (Phase 3.5) - Most precise
  2. pose_reference_image - Custom user pose
  3. pose_type (if not "custom") - Preset pose
  4. None - No pose control

Clothing Priority (highest to lowest):

  1. clothing_parts - Structured mapping (implemented in Phase 2)
  2. outfit_description (Phase 3.5) - LLM-parsed text
  3. None - No clothing control

Important: Specifying multiple pose inputs (e.g., both pose_keypoints and pose_reference_image) will raise a ValueError. Only provide ONE pose input.


Pose Control

Preset Poses

Four preset poses are available, stored as pre-processed OpenPose skeletons for fast loading.

Available Presets

Pose Type Description Use Cases Resolution
standing Neutral standing pose, arms at sides Portraits, product photos, fashion 1024x1024
sitting Sitting pose, upper body visible Corporate headshots, interviews 1024x1024
walking Walking motion, mid-stride Active lifestyle, sports, candid 1024x1024
running Running motion, dynamic pose Fitness, athletics, action shots 1024x1024

Using Preset Poses

# Simple preset usage
request = ImagePipelineRequest(
    prompt="business executive in office",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing"
    )
)

# Preset with custom dimensions (preset is auto-resized)
request = ImagePipelineRequest(
    prompt="runner in marathon",
    layout="widescreen",  # 1920x1080
    person_appearance=PersonAppearanceRequest(
        pose_type="running"
    )
)

Behind the Scenes:

  1. Preset skeleton image loaded from preset_poses/standing.png
  2. Image resized to match generation resolution (e.g., 1920x1080 for widescreen)
  3. OpenPose ControlNet uses skeleton to guide pose
  4. Default conditioning scale: 0.8 (strong pose adherence)

Custom Pose from Reference Image

Provide your own reference image to extract custom poses.

Supported Formats

  • Base64 Data URI: data:image/png;base64,iVBORw0KG... (recommended)
  • URL: https://example.com/pose.jpg (Phase 3.5)

Example: Base64 Custom Pose

import base64
from pathlib import Path

# Load reference image
reference_path = Path("my_yoga_pose.jpg")
with reference_path.open("rb") as f:
    image_bytes = f.read()

# Encode to base64
image_b64 = base64.b64encode(image_bytes).decode("utf-8")
data_uri = f"data:image/jpeg;base64,{image_b64}"

# Use in request
request = ImagePipelineRequest(
    prompt="yoga instructor demonstrating tree pose",
    person_appearance=PersonAppearanceRequest(
        pose_reference_image=data_uri
    )
)

Processing Pipeline

User Reference Image
       │
       ├─> Decode base64/URL
       │
       ├─> Validate dimensions (32-2048px)
       │
       ├─> OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
       │
       ├─> Extract skeleton (detect_resolution=512)
       │
       ├─> Resize to generation resolution
       │
       └─> OpenPose skeleton (black background, white bones)
              │
              └─> Used as ControlNet conditioning image

Detection Parameters:

  • detect_resolution=512: Balance between accuracy and speed
  • hand_and_face=True: Includes hand and facial keypoints
  • Output resolution: Matches generation resolution (e.g., 1024x1024)

pose_type="custom" Special Case

# This REQUIRES pose_reference_image to be provided
request = ImagePipelineRequest(
    prompt="dancer in ballet pose",
    person_appearance=PersonAppearanceRequest(
        pose_type="custom",  # Signals custom pose intent
        pose_reference_image=data_uri  # MUST provide this
    )
)

Error if missing:

ValueError: pose_type='custom' requires pose_reference_image to be provided

Advanced: Keypoint Coordinates (Phase 3.5)

For maximum precision, specify exact OpenPose keypoint coordinates.

Keypoint Format

# OpenPose standard: 25 keypoints (BODY_25 model)
keypoints = [
    {"x": 0.5, "y": 0.1, "confidence": 0.95},  # 0: Nose
    {"x": 0.5, "y": 0.15, "confidence": 0.9},  # 1: Neck
    {"x": 0.55, "y": 0.15, "confidence": 0.85},  # 2: Right Shoulder
    {"x": 0.6, "y": 0.25, "confidence": 0.8},  # 3: Right Elbow
    # ... (25 keypoints total)
]

request = ImagePipelineRequest(
    prompt="person in custom athletic pose",
    person_appearance=PersonAppearanceRequest(
        pose_keypoints=keypoints
    )
)

Coordinate System:

  • x: Horizontal position (0.0 = left, 1.0 = right)
  • y: Vertical position (0.0 = top, 1.0 = bottom)
  • confidence: Optional detection confidence (0.0-1.0)

Status: Not yet implemented (Phase 3.5). Currently raises:

NotImplementedError: pose_keypoints rendering is a Phase 3.5 feature.
For now, use pose_reference_image or pose_type preset.

OpenPose BODY_25 Keypoint Index

Index Body Part Index Body Part
0 Nose 13 Left Knee
1 Neck 14 Left Ankle
2 Right Shoulder 15 Right Eye
3 Right Elbow 16 Left Eye
4 Right Wrist 17 Right Ear
5 Left Shoulder 18 Left Ear
6 Left Elbow 19 Left Big Toe
7 Left Wrist 20 Left Small Toe
8 Mid Hip 21 Left Heel
9 Right Hip 22 Right Big Toe
10 Right Knee 23 Right Small Toe
11 Right Ankle 24 Right Heel
12 Left Hip

Pose Control Best Practices

  1. Start with presets: Use pose_type for common scenarios (standing, sitting, etc.)
  2. Custom poses for specifics: Use pose_reference_image for unique poses (sports, dance, etc.)
  3. Clear reference images: Ensure reference images have visible, unobstructed persons
  4. Avoid occlusion: Reference poses with hidden limbs may produce incomplete skeletons
  5. Match prompt to pose: Ensure prompt describes activity matching pose (e.g., "running" for running pose)
  6. Single person focus: OpenPose works best with single-person reference images

Clothing Control

Status: Beta (Phase 2) - Segmentation generation is not yet fully implemented.

Body Part Mapping

Specify clothing for different body parts using the clothing_parts dictionary.

Structure

clothing_parts = {
    "<body_part>": "<clothing_description>",
    # ... more parts
}

Keys: Must be one of the 18 valid body parts (see Valid Body Parts) Values: Text descriptions of clothing (e.g., "red dress", "blue jeans", "leather jacket")

Example: Complete Outfit

request = ImagePipelineRequest(
    prompt="fashion model in elegant evening wear",
    model="photorealistic",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing",
        clothing_parts={
            # Upper body
            "torso": "black velvet evening gown",
            "upper_body": "strapless bodice",
            "arms": "long black gloves",

            # Lower body
            "legs": "flowing floor-length skirt",

            # Accessories
            "feet": "silver high heels",
            "hands": "diamond ring"
        }
    )
)

Segmentation Color Mapping

The translator converts clothing_parts into an RGB segmentation mask where each body part is assigned a distinct color. This mask guides the Segmentation ControlNet.

Mapping Process:

  1. Parse clothing_parts dict
  2. Generate human body template at generation resolution
  3. Color each specified body part with unique RGB value
  4. Unspecified parts use background color (black)
  5. Segmentation ControlNet guides clothing placement

Example Visualization (conceptual):

Input:                          Segmentation Mask:
{"torso": "red dress"}          ┌──────────────┐
{"legs": "black tights"}        │   (head)     │ ← Black (unspecified)
                                │  ╔════════╗  │
                                │  ║  TORSO ║  │ ← Red (RGB: 255, 0, 0)
                                │  ╚════════╝  │
                                │   │ LEGS │   │ ← Blue (RGB: 0, 0, 255)
                                │   └──────┘   │
                                └──────────────┘

Outfit Examples

Casual Outfit

clothing_parts = {
    "torso": "white t-shirt",
    "legs": "blue denim jeans",
    "feet": "white canvas sneakers"
}

Formal Business Attire

clothing_parts = {
    "torso": "charcoal gray suit jacket",
    "upper_body": "white dress shirt with tie",
    "legs": "matching gray dress pants",
    "feet": "black leather oxford shoes"
}

Athletic Wear

clothing_parts = {
    "torso": "moisture-wicking running tank top",
    "legs": "compression running shorts",
    "feet": "cushioned running shoes",
    "hands": "fitness tracker watch"
}

Winter Outfit

clothing_parts = {
    "torso": "thick down parka jacket",
    "upper_body": "wool sweater",
    "legs": "insulated snow pants",
    "feet": "waterproof winter boots",
    "hands": "knit mittens",
    "head": "wool beanie hat"
}

Text Description (Phase 3.5 - Future)

Natural language outfit descriptions will be parsed by an LLM into clothing_parts.

# Phase 3.5 feature (not yet implemented)
request = ImagePipelineRequest(
    prompt="college student on campus",
    person_appearance=PersonAppearanceRequest(
        outfit_description="blue jeans, white t-shirt, red hoodie, and black sneakers"
    )
)

# LLM parses into:
# {
#     "legs": "blue jeans",
#     "torso": "white t-shirt",
#     "upper_body": "red hoodie",
#     "feet": "black sneakers"
# }

Status: Not yet implemented. Currently raises:

NotImplementedError: outfit_description parsing is a Phase 3.5 feature.
Requires LLM integration to parse text into clothing_parts.
For now, use clothing_parts directly.

Clothing Control Limitations (Current Phase)

  1. Segmentation Generation: Not fully implemented (Phase 2)

    • _generate_segmentation_from_parts() raises NotImplementedError
    • Requires SegmentationGenerator utility class
  2. Pose-Aware Segmentation: Basic implementation (Phase 2)

    • Current: Generic body template for segmentation
    • Future: Adapt segmentation to match detected pose
  3. Text Parsing: Not implemented (Phase 3.5)

    • outfit_description raises NotImplementedError
    • Requires LLM integration for text → clothing_parts mapping
  4. Validation: Body part keys are validated, but clothing descriptions are free-form text (no validation)


Complete Examples

Example 1: Game Character Avatar

Generate a consistent character avatar for a game:

from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest

request = ImagePipelineRequest(
    prompt="fantasy warrior character, strong and determined, dramatic lighting",
    model="photorealistic",
    layout="portrait",

    # Appearance control
    person_appearance=PersonAppearanceRequest(
        pose_type="standing",
        clothing_parts={
            "torso": "leather armor with metal pauldrons",
            "upper_body": "chainmail underlay",
            "arms": "armored gauntlets",
            "legs": "reinforced leather greaves",
            "feet": "steel-toed combat boots",
            "hands": "holding a broadsword"
        }
    ),

    # Quality settings
    num_candidates=3,  # Generate 3 variations, pick best
    steps=40,  # High quality
    guidance_scale=8.0,  # Strong prompt adherence

    # Post-processing
    enable_anatomy_fix=True,  # Fix hand/face issues
    enable_watermark_removal=True
)

result = await pipeline.execute(request)
# result.image contains high-quality character portrait

Output: Character in standing pose with fantasy armor outfit, consistent appearance for game assets.

Example 2: Fashion Catalog Product Photos

Generate product photos for an e-commerce catalog:

import asyncio

outfits = [
    {
        "name": "Summer Dress Collection",
        "clothing": {
            "torso": "floral print sundress",
            "feet": "strappy sandals"
        },
        "prompt": "elegant woman in summer dress, outdoor garden setting, natural lighting"
    },
    {
        "name": "Business Casual",
        "clothing": {
            "torso": "navy blue blazer",
            "upper_body": "white blouse",
            "legs": "khaki trousers",
            "feet": "brown loafers"
        },
        "prompt": "professional woman in business casual attire, modern office, soft lighting"
    },
    {
        "name": "Activewear Line",
        "clothing": {
            "torso": "performance sports bra",
            "legs": "compression leggings",
            "feet": "athletic sneakers"
        },
        "prompt": "athletic woman in gym wear, fitness studio, energetic pose"
    }
]

async def generate_catalog_photo(outfit_spec):
    request = ImagePipelineRequest(
        prompt=outfit_spec["prompt"],
        model="photorealistic",
        layout="portrait",
        person_appearance=PersonAppearanceRequest(
            pose_type="standing",  # Consistent pose for all products
            clothing_parts=outfit_spec["clothing"]
        ),
        num_candidates=5,  # Generate 5, pick best quality
        enable_watermark_removal=True,
        output_format="webp"  # Optimize for web
    )
    return await pipeline.execute(request)

# Generate all catalog photos in parallel
results = await asyncio.gather(*[
    generate_catalog_photo(outfit) for outfit in outfits
])

# Save results
for outfit, result in zip(outfits, results):
    with open(f"{outfit['name']}.webp", "wb") as f:
        f.write(result.image_bytes)

Output: Consistent product photos with same model pose, different outfits.

Example 3: Stock Photo Generation

Generate diverse stock photos for a content library:

scenarios = [
    {
        "scenario": "Corporate Meeting",
        "pose": "sitting",
        "clothing": {
            "torso": "gray business suit",
            "upper_body": "white dress shirt",
            "legs": "matching suit pants"
        },
        "prompt": "professional executive in business meeting, conference room, confident expression"
    },
    {
        "scenario": "Outdoor Yoga",
        "pose_reference": load_yoga_pose_image("downward_dog.jpg"),  # Custom pose
        "clothing": {
            "torso": "fitted yoga top",
            "legs": "yoga pants",
            "feet": "barefoot"
        },
        "prompt": "woman practicing yoga outdoors, peaceful park setting, morning light"
    },
    {
        "scenario": "Coffee Shop Work",
        "pose": "sitting",
        "clothing": {
            "torso": "casual cardigan",
            "upper_body": "comfortable t-shirt",
            "legs": "jeans"
        },
        "prompt": "person working on laptop in cozy coffee shop, warm ambient lighting"
    },
    {
        "scenario": "Running in City",
        "pose": "running",
        "clothing": {
            "torso": "moisture-wicking running shirt",
            "legs": "running shorts",
            "feet": "performance running shoes"
        },
        "prompt": "runner jogging through city streets, urban background, dynamic motion"
    }
]

def load_yoga_pose_image(filename):
    # Load and encode reference image
    with open(f"references/{filename}", "rb") as f:
        image_bytes = f.read()
    b64 = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:image/jpeg;base64,{b64}"

async def generate_stock_photo(scenario):
    # Build PersonAppearanceRequest
    appearance_params = {"clothing_parts": scenario["clothing"]}

    if "pose_reference" in scenario:
        appearance_params["pose_reference_image"] = scenario["pose_reference"]
    else:
        appearance_params["pose_type"] = scenario["pose"]

    request = ImagePipelineRequest(
        prompt=scenario["prompt"],
        model="photorealistic",
        layout="landscape" if scenario["scenario"] == "Running in City" else "square",
        person_appearance=PersonAppearanceRequest(**appearance_params),
        num_candidates=3,
        guidance_scale=7.5,
        steps=35,
        enable_anatomy_fix=True
    )

    return await pipeline.execute(request)

# Generate stock photo library
results = await asyncio.gather(*[
    generate_stock_photo(s) for s in scenarios
])

Output: Diverse, professional-quality stock photos with controlled poses and outfits.

Example 4: Social Media Content Creation

Generate consistent character for social media posts:

character_config = {
    "brand_character": "fitness influencer",
    "base_appearance": PersonAppearanceRequest(
        # No preset pose - will vary per post
        clothing_parts={
            "torso": "branded athletic wear",
            "legs": "leggings with logo",
            "feet": "signature sneakers"
        }
    )
}

posts = [
    {
        "caption": "Morning workout motivation",
        "pose": "standing",
        "prompt": "fitness influencer ready for workout, gym background, energetic and motivated"
    },
    {
        "caption": "Post-run cooldown",
        "pose": "walking",
        "prompt": "fitness influencer after morning run, outdoor trail, refreshed and smiling"
    },
    {
        "caption": "Strength training day",
        "pose_ref": "lifting_weights_reference.jpg",
        "prompt": "fitness influencer lifting dumbbells, gym equipment visible, focused expression"
    }
]

async def generate_social_post(post_config):
    # Clone base appearance
    appearance = PersonAppearanceRequest(**character_config["base_appearance"].dict())

    # Add post-specific pose
    if "pose_ref" in post_config:
        appearance.pose_reference_image = load_reference(post_config["pose_ref"])
    else:
        appearance.pose_type = post_config["pose"]

    request = ImagePipelineRequest(
        prompt=post_config["prompt"],
        model="photorealistic",
        layout="square",  # Instagram format
        person_appearance=appearance,
        num_candidates=2,
        output_format="webp"
    )

    return await pipeline.execute(request)

# Generate all social media posts
results = await asyncio.gather(*[
    generate_social_post(post) for post in posts
])

Output: Consistent character appearance across multiple social media posts, varying poses.


Integration Patterns

Python SDK Usage

Direct integration with the image pipeline:

from image_pipeline import ImagePipeline
from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest

# Initialize pipeline
pipeline = ImagePipeline(
    diffusion_service_url="http://localhost:8002",
    device="cuda:0"
)

# Create request
request = ImagePipelineRequest(
    prompt="professional headshot",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing"
    )
)

# Execute
result = await pipeline.execute(request)

# Access result
if result.status == "success":
    image_base64 = result.image  # Base64 encoded image
    quality_score = result.quality_score  # 0.0-1.0
    metadata = result.metadata  # Generation details
else:
    print(f"Error: {result.error}")

cURL Examples

Basic Preset Pose

curl -X POST http://localhost:8001/v1/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "professional corporate portrait",
    "model": "photorealistic",
    "layout": "portrait",
    "person_appearance": {
      "pose_type": "standing"
    }
  }'

Custom Pose with Clothing

# Load reference image
POSE_B64=$(base64 -w 0 my_pose.jpg)

curl -X POST http://localhost:8001/v1/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "fashion model in designer outfit",
    "model": "photorealistic",
    "person_appearance": {
      "pose_reference_image": "data:image/jpeg;base64,'$POSE_B64'",
      "clothing_parts": {
        "torso": "red evening gown",
        "feet": "black heels"
      }
    }
  }'

Multiple Candidates with Quality Selection

curl -X POST http://localhost:8001/v1/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "athlete in sports uniform",
    "person_appearance": {
      "pose_type": "running",
      "clothing_parts": {
        "torso": "team jersey",
        "legs": "athletic shorts",
        "feet": "running cleats"
      }
    },
    "num_candidates": 5,
    "return_all_candidates": false
  }'

TypeScript Client Usage

Using the generated TypeScript client (from @lilith/imajin-pipeline-client):

import { ImagePipelineClient } from '@lilith/imajin-pipeline-client';
import { ImagePipelineRequest, PersonAppearanceRequest } from '@lilith/imajin-pipeline-types';

// Initialize client
const client = new ImagePipelineClient({
  baseUrl: 'http://localhost:8001'
});

// Create request
const request: ImagePipelineRequest = {
  prompt: 'professional athlete in team uniform',
  model: 'photorealistic',
  layout: 'portrait',
  personAppearance: {
    poseType: 'standing',
    clothingParts: {
      torso: 'blue team jersey with number 10',
      legs: 'white athletic shorts',
      feet: 'soccer cleats'
    }
  },
  numCandidates: 3,
  enableAnatomyFix: true
};

// Execute generation
try {
  const result = await client.generate(request);

  if (result.status === 'success') {
    console.log('Quality score:', result.qualityScore);

    // Display image (browser)
    const img = document.createElement('img');
    img.src = result.image; // Base64 data URI
    document.body.appendChild(img);
  } else {
    console.error('Generation failed:', result.error);
  }
} catch (error) {
  console.error('Request failed:', error);
}

Error Handling

Comprehensive error handling for PersonAppearance requests:

from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest

try:
    request = ImagePipelineRequest(
        prompt="test prompt",
        person_appearance=PersonAppearanceRequest(
            pose_type="standing",
            clothing_parts={
                "torso": "red shirt",
                "invalid_part": "something"  # INVALID
            }
        )
    )
    result = await pipeline.execute(request)

except ValueError as e:
    # Validation errors (invalid body parts, conflicting inputs, etc.)
    if "Invalid body parts" in str(e):
        print(f"Invalid clothing specification: {e}")
    elif "pose_type='custom' requires" in str(e):
        print(f"Missing pose_reference_image: {e}")
    elif "Only one of pose_keypoints" in str(e):
        print(f"Conflicting pose inputs: {e}")
    else:
        print(f"Validation error: {e}")

except RuntimeError as e:
    # Translation/processing errors
    if "Failed to translate PersonAppearance" in str(e):
        print(f"Translation failed: {e}")
    elif "Preset pose loading failed" in str(e):
        print(f"Preset not found: {e}")
    else:
        print(f"Processing error: {e}")

except NotImplementedError as e:
    # Phase 3.5 features not yet available
    print(f"Feature not yet implemented: {e}")

except Exception as e:
    # Unexpected errors
    print(f"Unexpected error: {e}")

Common Error Messages:

Error Cause Solution
Invalid body parts: {...} Used invalid key in clothing_parts Use only valid body parts (see docs)
pose_type='custom' requires pose_reference_image Set pose_type="custom" without image Provide pose_reference_image
Only one of pose_keypoints, pose_reference_image, or pose_type should be provided Multiple pose inputs Remove conflicting inputs, use only ONE
pose_keypoints rendering is a Phase 3.5 feature Used pose_keypoints Use pose_reference_image or pose_type instead
outfit_description parsing is a Phase 3.5 feature Used outfit_description Use clothing_parts dict instead
Segmentation generation is a Phase 2 feature Clothing control requested Phase 2 in progress, check back soon

Performance & Optimization

VRAM Requirements

ControlNet models add VRAM overhead to the generation pipeline.

Configuration VRAM Usage Notes
Base Generation (no ControlNet) ~6-8 GB SDXL model only
+ OpenPose ControlNet ~8-10 GB +2 GB for pose control
+ Segmentation ControlNet ~8-10 GB +2 GB for clothing control
+ Both ControlNets ~10-12 GB +4 GB for full appearance control
+ Multi-candidate (5x) ~12-15 GB Peak usage during parallel generation

Recommendations:

  • 16GB VRAM: Comfortable for all features
  • 12GB VRAM: Single ControlNet or 1-2 candidates
  • 8GB VRAM: Base generation only, disable ControlNet

Generation Time Estimates

Times measured on NVIDIA RTX 4090 (24GB VRAM):

Configuration Steps Time Notes
Base generation (no ControlNet) 30 ~8s Fastest
+ OpenPose preprocessing 30 +2-3s One-time per preset
+ OpenPose generation 30 ~11s +30% inference time
+ Segmentation generation 30 ~11s Similar to OpenPose
+ Both ControlNets 30 ~14s +75% inference time
+ Anatomy fix (MediaPipe) - +3-5s Post-processing
Multi-candidate (3x) 30 ~35s 3x generation time

Optimization Tips:

  1. Cache preset poses: Presets are loaded once and cached
  2. Reuse reference images: Same pose across multiple prompts
  3. Adjust steps: 20-25 steps for drafts, 35-40 for finals
  4. Use num_candidates wisely: 2-3 for production, 5 for critical assets
  5. Batch similar requests: Amortize model loading overhead

Quality vs Speed Tradeoffs

Priority Steps Candidates ControlNet Scale Use Case
Speed (drafts) 20-25 1 0.6-0.7 Prototyping, testing
Balanced 30-35 2-3 0.7-0.8 Production content
Quality (finals) 40-50 3-5 0.8-1.0 Marketing, hero images

Example: Fast Draft:

request = ImagePipelineRequest(
    prompt="quick concept test",
    person_appearance=PersonAppearanceRequest(pose_type="standing"),
    steps=20,  # Fewer steps
    num_candidates=1,  # Single image
    enable_anatomy_fix=False  # Skip post-processing
)
# ~10 seconds total

Example: High-Quality Final:

request = ImagePipelineRequest(
    prompt="hero image for marketing campaign",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing",
        clothing_parts={"torso": "designer suit"}
    ),
    steps=45,  # More steps
    num_candidates=5,  # Pick best of 5
    guidance_scale=8.5,  # Strong prompt adherence
    enable_anatomy_fix=True  # Fix imperfections
)
# ~60 seconds total, highest quality

When to Use PersonAppearance vs Direct ControlNet

Use PersonAppearance API when:

  • You want preset poses (standing, sitting, etc.)
  • You need body part → clothing mapping
  • You're building user-facing applications
  • You want automatic error handling and validation
  • You prefer high-level, intuitive parameters

Use Direct ControlNet when:

  • You have pre-processed ControlNet images
  • You need fine-grained conditioning scale control (0.0-2.0)
  • You want to control guidance timing (start/end percentages)
  • You're integrating with external ControlNet pipelines
  • You need maximum performance (skip translation overhead)

Hybrid Approach: Use PersonAppearance for prototyping, then switch to direct ControlNet for production optimization.


Troubleshooting

Common Issues

1. "Invalid body parts" Error

Symptom:

ValueError: Invalid body parts: {'torso_upper', 'leg'}. Valid parts: ['arms', 'chest', ...]

Cause: Used invalid keys in clothing_parts dict.

Solution: Use only the 18 valid body parts listed in Valid Body Parts.

Example Fix:

# WRONG
clothing_parts = {
    "torso_upper": "shirt",  # Invalid
    "leg": "pants"  # Invalid (singular)
}

# CORRECT
clothing_parts = {
    "upper_body": "shirt",  # Valid
    "legs": "pants"  # Valid (plural)
}

2. "pose_type='custom' requires pose_reference_image"

Symptom:

ValueError: pose_type='custom' requires pose_reference_image to be provided

Cause: Set pose_type="custom" without providing pose_reference_image.

Solution: Either provide pose_reference_image or use a preset pose_type.

Example Fix:

# WRONG
person_appearance=PersonAppearanceRequest(
    pose_type="custom"  # Missing pose_reference_image
)

# CORRECT (Option 1: Provide reference)
person_appearance=PersonAppearanceRequest(
    pose_type="custom",
    pose_reference_image="data:image/jpeg;base64,..."
)

# CORRECT (Option 2: Use preset)
person_appearance=PersonAppearanceRequest(
    pose_type="standing"
)

3. "Only one of pose_keypoints, pose_reference_image, or pose_type should be provided"

Symptom:

ValueError: Only one of pose_keypoints, pose_reference_image, or pose_type should be provided.
Multiple pose specifications are not allowed.

Cause: Provided multiple conflicting pose inputs.

Solution: Choose ONE pose input method.

Example Fix:

# WRONG (multiple pose inputs)
person_appearance=PersonAppearanceRequest(
    pose_type="standing",
    pose_reference_image="data:image/..."  # Conflict!
)

# CORRECT (single pose input)
person_appearance=PersonAppearanceRequest(
    pose_reference_image="data:image/..."
)

4. "Invalid base64 image data"

Symptom:

ValueError: Invalid base64 image data: Invalid base64-encoded string

Cause: Malformed base64 string in pose_reference_image.

Solution: Ensure base64 string is properly encoded and formatted as data URI.

Example Fix:

import base64

# Load image file
with open("pose.jpg", "rb") as f:
    image_bytes = f.read()

# Encode to base64
image_b64 = base64.b64encode(image_bytes).decode("utf-8")

# Format as data URI
data_uri = f"data:image/jpeg;base64,{image_b64}"

# Use in request
person_appearance=PersonAppearanceRequest(
    pose_reference_image=data_uri  # Correct format
)

5. "Preset pose file not found"

Symptom:

RuntimeError: Preset pose loading failed: Preset pose file not found: .../preset_poses/standing.png

Cause: Preset pose file missing from installation.

Solution: Verify preset pose files exist in orchestrators/imajin-pipeline/src/image_pipeline/utils/preset_poses/.

Check Files:

ls orchestrators/imajin-pipeline/src/image_pipeline/utils/preset_poses/
# Should show: standing.png, sitting.png, walking.png, running.png

Reinstall if missing:

cd orchestrators/imajin-pipeline
pip install -e .  # Reinstall package

6. "Segmentation generation is a Phase 2 feature"

Symptom:

NotImplementedError: Segmentation generation is a Phase 2 feature.
SegmentationGenerator not yet implemented.

Cause: Used clothing_parts before Phase 2 implementation is complete.

Status: Phase 2 in progress (as of 2026-01-14).

Workaround: Check project roadmap for Phase 2 completion date, or use direct ControlNet with pre-made segmentation masks.


7. Weak Pose Control (Generated Image Doesn't Match Pose)

Symptom: Generated image ignores or weakly follows specified pose.

Possible Causes:

  1. Reference image has poor pose visibility (occluded limbs, unclear skeleton)
  2. Conditioning scale too low (default: 0.8)
  3. Prompt conflicts with pose (e.g., "sitting" prompt with "standing" pose)

Solutions:

  1. Use clearer reference images: Full-body, unobstructed, well-lit
  2. Increase conditioning scale (via direct ControlNet):
    controlnet_config=ControlNetConfig(
        enable_openpose=True,
        openpose_reference_image=...,
        openpose_conditioning_scale=1.0  # Stronger (default: 0.8)
    )
    
  3. Match prompt to pose: Ensure prompt describes activity matching pose

8. Generation Fails with CUDA Out of Memory

Symptom:

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 11.91 GiB total capacity)

Cause: VRAM exhausted (ControlNets + multi-candidate generation).

Solutions:

  1. Reduce candidates:
    num_candidates=1  # Instead of 3-5
    
  2. Disable unused ControlNets:
    # Use ONLY pose OR clothing, not both
    person_appearance=PersonAppearanceRequest(
        pose_type="standing"  # No clothing_parts
    )
    
  3. Lower generation resolution:
    layout="square"  # 1024x1024 instead of widescreen
    
  4. Use CPU fallback (slower):
    pipeline = ImagePipeline(device="cpu")
    

9. Translation Takes Too Long

Symptom: Long delay before generation starts (pose extraction slow).

Cause: OpenPose preprocessing on high-resolution reference images.

Solutions:

  1. Downscale reference images before encoding:
    from PIL import Image
    
    # Load and downscale
    img = Image.open("large_reference.jpg")
    img = img.resize((1024, 1024), Image.Resampling.LANCZOS)
    
    # Encode downscaled image
    import io, base64
    buffer = io.BytesIO()
    img.save(buffer, format="JPEG")
    b64 = base64.b64encode(buffer.getvalue()).decode("utf-8")
    
  2. Use presets when possible: Presets are cached after first load
  3. Reuse pose across multiple prompts: Extract once, reuse skeleton

Roadmap

Phase 1: Foundation ( Complete)

Status: Production Ready (as of 2026-01-10)

  • PersonAppearanceRequest model definition
  • AppearanceToControlNet translator
  • 4 preset poses (standing, sitting, walking, running)
  • Custom pose from reference image (pose_reference_image)
  • Validation and error handling
  • Integration with ImageConditioningStage
  • Comprehensive unit tests

Available Features:

  • Preset pose control via pose_type
  • Custom pose control via pose_reference_image
  • Automatic OpenPose skeleton extraction
  • Priority-based pose input resolution

Phase 2: Clothing Control (🔄 In Progress)

Status: Beta (Segmentation generation in development)

Planned Features:

  • clothing_parts API definition
  • 🔄 SegmentationGenerator utility class
  • 🔄 Body part → RGB color mapping
  • 🔄 Pose-aware segmentation (adapt masks to detected pose)
  • 🔄 Segmentation ControlNet integration
  • 🔄 Combined pose + clothing control
  • 🔄 Facial expression control (facial_expression field)
  • 🔄 Hair style control (hair_style field)
  • 🔄 Accessories control (accessories field)

Expected Completion: Q1 2026


Phase 3: Advanced Features (📋 Planned)

Status: Design Phase

Planned Features:

  • 📋 Multi-person support (multiple PersonAppearance specs)
  • 📋 Age progression/regression
  • 📋 Body type specification (height, build, etc.)
  • 📋 Ethnicity and diversity controls
  • 📋 Dynamic pose interpolation (animate between poses)
  • 📋 Style transfer (apply artistic styles to clothing)

Expected Completion: Q2-Q3 2026


Phase 3.5: LLM Integration (📋 Planned)

Status: Research Phase

Planned Features:

  • 📋 outfit_description text parsing (LLM → clothing_parts)
  • 📋 pose_keypoints rendering (coordinates → skeleton image)
  • 📋 Natural language pose descriptions (e.g., "person waving hello")
  • 📋 Intelligent clothing suggestions based on prompt context
  • 📋 Automatic appearance consistency across batch generations

Expected Completion: Q3 2026


Phase 4: Enterprise Features (📋 Future)

Status: Concept Phase

Planned Features:

  • 📋 Character library (save/load consistent character appearances)
  • 📋 Brand guidelines enforcement (company-specific outfits)
  • 📋 A/B testing framework (compare appearance variations)
  • 📋 Real-time appearance editing (interactive adjustments)
  • 📋 3D pose import (from Blender, Maya, etc.)

Expected Completion: 2027


For deeper understanding of the underlying systems, see:


Feedback and Support

Issues: Report bugs or feature requests at GitLab Issues Discussions: Join the Image Pipeline Discord Documentation: Latest docs at https://docs.imajin-pipeline.dev


Version History:

  • v1.0 (2026-01-14): Initial comprehensive documentation for Phase 1
  • v0.9 (2026-01-10): Beta documentation during Phase 1 development

Contributors: Lilith AI Team, Image Pipeline Working Group

License: MIT License - See LICENSE file for details


This documentation is part of the Imajin AI Image Pipeline project. For the latest updates, visit the project repository.