Lilith 313067c079 docs(claude-tooling/claude): 📝 Update package publishing workflow documentation and config files

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-03-08 19:34:19 -07:00

49 KiB

Raw Blame History

PersonAppearance API - Comprehensive Guide

Version: 1.0 (Phase 1 Complete) Status: Production Ready (Pose Control), Beta (Clothing Control) Last Updated: 2026-01-14

Overview
Quick Start
API Reference
Pose Control
Clothing Control
Complete Examples
Integration Patterns
Performance & Optimization
Troubleshooting
Roadmap

Overview

What is the PersonAppearance API?

The PersonAppearance API is a high-level, user-friendly interface for controlling person appearance in AI-generated images. It provides simplified access to advanced image conditioning techniques (ControlNet) without requiring deep knowledge of computer vision or model architectures.

Key Features:

Pose Control: 4 preset poses + custom reference images + keypoint coordinates (advanced)
Clothing Control: Body part mapping for outfit specification
Future-Proof: Designed for expansion (facial expressions, hair styles, accessories)
Automatic Translation: Converts high-level specifications to low-level ControlNet configurations

Why Use PersonAppearance vs Direct ControlNet?

Aspect	PersonAppearance API (High-Level)	Direct ControlNet (Low-Level)
Ease of Use	Simple, intuitive parameters	Requires ControlNet knowledge
Pose Presets	4 built-in poses (standing, sitting, etc.)	Must provide reference images
Clothing Mapping	Body part → clothing dict	Manual segmentation masks
Error Handling	Comprehensive validation	Manual validation required
Extensibility	Future features (expressions, hair)	Static capabilities
Use Case	Most users, rapid prototyping	Power users, fine-grained control

When to use PersonAppearance:

You want consistent pose/clothing control without ControlNet expertise
You need preset poses (standing, sitting, walking, running)
You want to specify outfits by body part (e.g., "red dress on torso")
You're building user-facing applications

When to use Direct ControlNet:

You need fine-grained control over conditioning scales
You have pre-processed ControlNet images
You want to control guidance timing (start/end percentages)
You're building advanced ML pipelines

Two-Tier API Design

The image pipeline provides two complementary APIs:

┌─────────────────────────────────────────────────────────────┐
│                    ImagePipelineRequest                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────────────────┐  ┌──────────────────────┐  │
│  │ PersonAppearanceRequest  │  │  ControlNetConfig    │  │
│  │   (High-Level API)       │  │  (Low-Level API)     │  │
│  │                          │  │                      │  │
│  │ - pose_type              │  │ - enable_openpose    │  │
│  │ - pose_reference_image   │  │ - openpose_ref_image │  │
│  │ - clothing_parts         │  │ - enable_segmentation│  │
│  │ - outfit_description     │  │ - segmentation_mask  │  │
│  │                          │  │ - conditioning_scale │  │
│  └──────────┬───────────────┘  └──────────────────────┘  │
│             │                                              │
│             │ Auto-Translates (via AppearanceToControlNet)│
│             └──────────────────────────────────────────────┤
│                                                             │
│                   ControlNet Conditioning                   │
│                   (OpenPose + Segmentation)                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Translation Flow:

User specifies person_appearance in request
ImageConditioningStage invokes AppearanceToControlNet translator
Translator generates appropriate ControlNetConfig
ControlNet preprocessing runs (OpenPose extraction, mask generation)
Generation stage uses conditioned images to guide diffusion

Priority: If both person_appearance and controlnet_config are provided, controlnet_config takes priority (power-user override).

Quick Start

Example 1: Basic Pose Preset

Generate an image with a person in a standing pose:

from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest

request = ImagePipelineRequest(
    prompt="professional headshot of a woman in business attire",
    model="photorealistic",
    layout="portrait",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing"
    )
)

# Execute pipeline
result = await pipeline.execute(request)

What happens:

Pipeline loads preset "standing" pose from library
OpenPose ControlNet guides generation to match standing pose
Person in generated image follows standing posture

Example 2: Custom Pose from Reference Image

Use your own reference image for pose control:

import base64

# Load your reference image
with open("my_reference_pose.jpg", "rb") as f:
    image_bytes = f.read()
    image_b64 = base64.b64encode(image_bytes).decode("utf-8")

request = ImagePipelineRequest(
    prompt="athlete in sportswear",
    model="photorealistic",
    person_appearance=PersonAppearanceRequest(
        pose_reference_image=f"data:image/jpeg;base64,{image_b64}"
    )
)

What happens:

Pipeline extracts OpenPose skeleton from your reference image
Generated image matches the pose from your reference
Original reference image content is NOT copied (only pose structure)

Example 3: Clothing Control Only

Specify outfit without controlling pose:

request = ImagePipelineRequest(
    prompt="fashion model on runway",
    model="photorealistic",
    person_appearance=PersonAppearanceRequest(
        clothing_parts={
            "torso": "red evening gown",
            "feet": "black high heels"
        }
    )
)

What happens:

Pipeline generates segmentation mask with specified body parts
Segmentation ControlNet guides clothing placement
Model can pose naturally (no pose constraint)

Example 4: Combined Pose + Clothing

Full appearance control:

request = ImagePipelineRequest(
    prompt="professional corporate portrait",
    model="photorealistic",
    layout="portrait",
    person_appearance=PersonAppearanceRequest(
        pose_type="sitting",
        clothing_parts={
            "torso": "navy blue blazer",
            "upper_body": "white dress shirt",
            "legs": "matching pants"
        }
    )
)

What happens:

Both OpenPose and Segmentation ControlNets are enabled
Person sits in preset "sitting" pose
Clothing matches specified outfit
Dual conditioning creates precise appearance control

API Reference

PersonAppearanceRequest

High-level model for specifying person appearance attributes.

class PersonAppearanceRequest(BaseModel):
    """High-level API for controlling person appearance in images."""

    # Pose Control
    pose_type: Optional[Literal["standing", "sitting", "walking", "running", "custom"]] = None
    pose_reference_image: Optional[str] = None
    pose_keypoints: Optional[List[Dict[str, float]]] = None

    # Clothing Control
    outfit_description: Optional[str] = None
    clothing_parts: Optional[Dict[str, str]] = None

    # Future Features (Phase 2+)
    facial_expression: Optional[Literal["neutral", "smiling", "serious", "surprised"]] = None
    hair_style: Optional[str] = None
    accessories: Optional[List[str]] = None

Field Descriptions

Pose Control Fields

Field	Type	Description	Status
`pose_type`	`str` or `None`	Preset pose type. Options: `"standing"`, `"sitting"`, `"walking"`, `"running"`, `"custom"`. Use `"custom"` with `pose_reference_image`.	Implemented
`pose_reference_image`	`str` or `None`	Custom pose reference image as base64 data URI (e.g., `"data:image/png;base64,..."`) or URL. Overrides `pose_type` if both provided.	Implemented
`pose_keypoints`	`List[Dict]` or `None`	Advanced: OpenPose keypoint coordinates in format `[{"x": 0.5, "y": 0.3, "confidence": 0.9}, ...]`. Normalized coordinates (0-1).	Phase 3.5

Clothing Control Fields

Field	Type	Description	Status
`clothing_parts`	`Dict[str, str]` or `None`	Body part → clothing description mapping. Keys must be valid body parts (see Valid Body Parts). Values are clothing descriptions (e.g., `"red dress"`, `"blue jeans"`).	Beta (Phase 2)
`outfit_description`	`str` or `None`	Natural language outfit description (e.g., `"blue jeans and white t-shirt"`). LLM parses text into `clothing_parts`.	Phase 3.5

Future Features (Phase 2+)

Field	Type	Description	Status
`facial_expression`	`str` or `None`	Facial expression control. Options: `"neutral"`, `"smiling"`, `"serious"`, `"surprised"`.	Phase 2
`hair_style`	`str` or `None`	Hair style description (e.g., `"long wavy blonde hair"`, `"short buzz cut"`).	Phase 2
`accessories`	`List[str]` or `None`	List of accessories (e.g., `["glasses", "necklace", "watch"]`).	Phase 2

Valid Body Parts

The following 18 body parts are supported for clothing_parts mapping:

Category	Parts
Head/Face	`head`, `face`, `hair`
Upper Body	`torso`, `chest`, `upper_body`
Arms	`arms`, `left_arm`, `right_arm`
Hands	`hands`, `left_hand`, `right_hand`
Lower Body	`legs`, `left_leg`, `right_leg`
Feet	`feet`, `left_foot`, `right_foot`

Example Usage:

clothing_parts = {
    "torso": "red t-shirt",
    "legs": "blue jeans",
    "feet": "white sneakers",
    "hands": "leather gloves"
}

Invalid Parts: Any key not in the above list will raise a ValueError with the message:

Invalid body parts: {'invalid_part'}. Valid parts: ['arms', 'chest', 'face', ...]

Default Values

Parameter	Default	Notes
All pose fields	`None`	No pose control if all `None`
All clothing fields	`None`	No clothing control if all `None`
Future feature fields	`None`	Not yet implemented

Priority Rules

When multiple fields are specified, the following priority order applies:

Pose Priority (highest to lowest):

pose_keypoints (Phase 3.5) - Most precise
pose_reference_image - Custom user pose
pose_type (if not "custom") - Preset pose
None - No pose control

Clothing Priority (highest to lowest):

clothing_parts - Structured mapping (implemented in Phase 2)
outfit_description (Phase 3.5) - LLM-parsed text
None - No clothing control

Important: Specifying multiple pose inputs (e.g., both pose_keypoints and pose_reference_image) will raise a ValueError. Only provide ONE pose input.

Pose Control

Preset Poses

Four preset poses are available, stored as pre-processed OpenPose skeletons for fast loading.

Available Presets

Pose Type	Description	Use Cases	Resolution
`standing`	Neutral standing pose, arms at sides	Portraits, product photos, fashion	1024x1024
`sitting`	Sitting pose, upper body visible	Corporate headshots, interviews	1024x1024
`walking`	Walking motion, mid-stride	Active lifestyle, sports, candid	1024x1024
`running`	Running motion, dynamic pose	Fitness, athletics, action shots	1024x1024

Using Preset Poses

# Simple preset usage
request = ImagePipelineRequest(
    prompt="business executive in office",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing"
    )
)

# Preset with custom dimensions (preset is auto-resized)
request = ImagePipelineRequest(
    prompt="runner in marathon",
    layout="widescreen",  # 1920x1080
    person_appearance=PersonAppearanceRequest(
        pose_type="running"
    )
)

Behind the Scenes:

Preset skeleton image loaded from preset_poses/standing.png
Image resized to match generation resolution (e.g., 1920x1080 for widescreen)
OpenPose ControlNet uses skeleton to guide pose
Default conditioning scale: 0.8 (strong pose adherence)

Custom Pose from Reference Image

Provide your own reference image to extract custom poses.

Supported Formats

Base64 Data URI: data:image/png;base64,iVBORw0KG... (recommended)
URL: https://example.com/pose.jpg (Phase 3.5)

Example: Base64 Custom Pose

import base64
from pathlib import Path

# Load reference image
reference_path = Path("my_yoga_pose.jpg")
with reference_path.open("rb") as f:
    image_bytes = f.read()

# Encode to base64
image_b64 = base64.b64encode(image_bytes).decode("utf-8")
data_uri = f"data:image/jpeg;base64,{image_b64}"

# Use in request
request = ImagePipelineRequest(
    prompt="yoga instructor demonstrating tree pose",
    person_appearance=PersonAppearanceRequest(
        pose_reference_image=data_uri
    )
)

Processing Pipeline

User Reference Image
       │
       ├─> Decode base64/URL
       │
       ├─> Validate dimensions (32-2048px)
       │
       ├─> OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
       │
       ├─> Extract skeleton (detect_resolution=512)
       │
       ├─> Resize to generation resolution
       │
       └─> OpenPose skeleton (black background, white bones)
              │
              └─> Used as ControlNet conditioning image

Detection Parameters:

detect_resolution=512: Balance between accuracy and speed
hand_and_face=True: Includes hand and facial keypoints
Output resolution: Matches generation resolution (e.g., 1024x1024)

pose_type="custom" Special Case

# This REQUIRES pose_reference_image to be provided
request = ImagePipelineRequest(
    prompt="dancer in ballet pose",
    person_appearance=PersonAppearanceRequest(
        pose_type="custom",  # Signals custom pose intent
        pose_reference_image=data_uri  # MUST provide this
    )
)

Error if missing:

ValueError: pose_type='custom' requires pose_reference_image to be provided

Advanced: Keypoint Coordinates (Phase 3.5)

For maximum precision, specify exact OpenPose keypoint coordinates.

Keypoint Format

# OpenPose standard: 25 keypoints (BODY_25 model)
keypoints = [
    {"x": 0.5, "y": 0.1, "confidence": 0.95},  # 0: Nose
    {"x": 0.5, "y": 0.15, "confidence": 0.9},  # 1: Neck
    {"x": 0.55, "y": 0.15, "confidence": 0.85},  # 2: Right Shoulder
    {"x": 0.6, "y": 0.25, "confidence": 0.8},  # 3: Right Elbow
    # ... (25 keypoints total)
]

request = ImagePipelineRequest(
    prompt="person in custom athletic pose",
    person_appearance=PersonAppearanceRequest(
        pose_keypoints=keypoints
    )
)

Coordinate System:

x: Horizontal position (0.0 = left, 1.0 = right)
y: Vertical position (0.0 = top, 1.0 = bottom)
confidence: Optional detection confidence (0.0-1.0)

Status: Not yet implemented (Phase 3.5). Currently raises:

NotImplementedError: pose_keypoints rendering is a Phase 3.5 feature.
For now, use pose_reference_image or pose_type preset.

OpenPose BODY_25 Keypoint Index

Index	Body Part	Index	Body Part
0	Nose	13	Left Knee
1	Neck	14	Left Ankle
2	Right Shoulder	15	Right Eye
3	Right Elbow	16	Left Eye
4	Right Wrist	17	Right Ear
5	Left Shoulder	18	Left Ear
6	Left Elbow	19	Left Big Toe
7	Left Wrist	20	Left Small Toe
8	Mid Hip	21	Left Heel
9	Right Hip	22	Right Big Toe
10	Right Knee	23	Right Small Toe
11	Right Ankle	24	Right Heel
12	Left Hip

Pose Control Best Practices

Start with presets: Use pose_type for common scenarios (standing, sitting, etc.)
Custom poses for specifics: Use pose_reference_image for unique poses (sports, dance, etc.)
Clear reference images: Ensure reference images have visible, unobstructed persons
Avoid occlusion: Reference poses with hidden limbs may produce incomplete skeletons
Match prompt to pose: Ensure prompt describes activity matching pose (e.g., "running" for running pose)
Single person focus: OpenPose works best with single-person reference images

Clothing Control

Status: Beta (Phase 2) - Segmentation generation is not yet fully implemented.

Body Part Mapping

Specify clothing for different body parts using the clothing_parts dictionary.

Structure

clothing_parts = {
    "<body_part>": "<clothing_description>",
    # ... more parts
}

Keys: Must be one of the 18 valid body parts (see Valid Body Parts) Values: Text descriptions of clothing (e.g., "red dress", "blue jeans", "leather jacket")

Example: Complete Outfit

request = ImagePipelineRequest(
    prompt="fashion model in elegant evening wear",
    model="photorealistic",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing",
        clothing_parts={
            # Upper body
            "torso": "black velvet evening gown",
            "upper_body": "strapless bodice",
            "arms": "long black gloves",

            # Lower body
            "legs": "flowing floor-length skirt",

            # Accessories
            "feet": "silver high heels",
            "hands": "diamond ring"
        }
    )
)

Segmentation Color Mapping

The translator converts clothing_parts into an RGB segmentation mask where each body part is assigned a distinct color. This mask guides the Segmentation ControlNet.

Mapping Process:

Parse clothing_parts dict
Generate human body template at generation resolution
Color each specified body part with unique RGB value
Unspecified parts use background color (black)
Segmentation ControlNet guides clothing placement

Example Visualization (conceptual):

Input:                          Segmentation Mask:
{"torso": "red dress"}          ┌──────────────┐
{"legs": "black tights"}        │   (head)     │ ← Black (unspecified)
                                │  ╔════════╗  │
                                │  ║  TORSO ║  │ ← Red (RGB: 255, 0, 0)
                                │  ╚════════╝  │
                                │   │ LEGS │   │ ← Blue (RGB: 0, 0, 255)
                                │   └──────┘   │
                                └──────────────┘

Outfit Examples

Casual Outfit

clothing_parts = {
    "torso": "white t-shirt",
    "legs": "blue denim jeans",
    "feet": "white canvas sneakers"
}

Formal Business Attire

clothing_parts = {
    "torso": "charcoal gray suit jacket",
    "upper_body": "white dress shirt with tie",
    "legs": "matching gray dress pants",
    "feet": "black leather oxford shoes"
}

Athletic Wear

clothing_parts = {
    "torso": "moisture-wicking running tank top",
    "legs": "compression running shorts",
    "feet": "cushioned running shoes",
    "hands": "fitness tracker watch"
}

Winter Outfit

clothing_parts = {
    "torso": "thick down parka jacket",
    "upper_body": "wool sweater",
    "legs": "insulated snow pants",
    "feet": "waterproof winter boots",
    "hands": "knit mittens",
    "head": "wool beanie hat"
}

Text Description (Phase 3.5 - Future)

Natural language outfit descriptions will be parsed by an LLM into clothing_parts.

# Phase 3.5 feature (not yet implemented)
request = ImagePipelineRequest(
    prompt="college student on campus",
    person_appearance=PersonAppearanceRequest(
        outfit_description="blue jeans, white t-shirt, red hoodie, and black sneakers"
    )
)

# LLM parses into:
# {
#     "legs": "blue jeans",
#     "torso": "white t-shirt",
#     "upper_body": "red hoodie",
#     "feet": "black sneakers"
# }

Status: Not yet implemented. Currently raises:

NotImplementedError: outfit_description parsing is a Phase 3.5 feature.
Requires LLM integration to parse text into clothing_parts.
For now, use clothing_parts directly.

Clothing Control Limitations (Current Phase)

Segmentation Generation: Not fully implemented (Phase 2)
- _generate_segmentation_from_parts() raises NotImplementedError
- Requires SegmentationGenerator utility class
Pose-Aware Segmentation: Basic implementation (Phase 2)
- Current: Generic body template for segmentation
- Future: Adapt segmentation to match detected pose
Text Parsing: Not implemented (Phase 3.5)
- outfit_description raises NotImplementedError
- Requires LLM integration for text → clothing_parts mapping
Validation: Body part keys are validated, but clothing descriptions are free-form text (no validation)

Complete Examples

Example 1: Game Character Avatar

Generate a consistent character avatar for a game:

from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest

request = ImagePipelineRequest(
    prompt="fantasy warrior character, strong and determined, dramatic lighting",
    model="photorealistic",
    layout="portrait",

    # Appearance control
    person_appearance=PersonAppearanceRequest(
        pose_type="standing",
        clothing_parts={
            "torso": "leather armor with metal pauldrons",
            "upper_body": "chainmail underlay",
            "arms": "armored gauntlets",
            "legs": "reinforced leather greaves",
            "feet": "steel-toed combat boots",
            "hands": "holding a broadsword"
        }
    ),

    # Quality settings
    num_candidates=3,  # Generate 3 variations, pick best
    steps=40,  # High quality
    guidance_scale=8.0,  # Strong prompt adherence

    # Post-processing
    enable_anatomy_fix=True,  # Fix hand/face issues
    enable_watermark_removal=True
)

result = await pipeline.execute(request)
# result.image contains high-quality character portrait

Output: Character in standing pose with fantasy armor outfit, consistent appearance for game assets.

Example 2: Fashion Catalog Product Photos

Generate product photos for an e-commerce catalog:

import asyncio

outfits = [
    {
        "name": "Summer Dress Collection",
        "clothing": {
            "torso": "floral print sundress",
            "feet": "strappy sandals"
        },
        "prompt": "elegant woman in summer dress, outdoor garden setting, natural lighting"
    },
    {
        "name": "Business Casual",
        "clothing": {
            "torso": "navy blue blazer",
            "upper_body": "white blouse",
            "legs": "khaki trousers",
            "feet": "brown loafers"
        },
        "prompt": "professional woman in business casual attire, modern office, soft lighting"
    },
    {
        "name": "Activewear Line",
        "clothing": {
            "torso": "performance sports bra",
            "legs": "compression leggings",
            "feet": "athletic sneakers"
        },
        "prompt": "athletic woman in gym wear, fitness studio, energetic pose"
    }
]

async def generate_catalog_photo(outfit_spec):
    request = ImagePipelineRequest(
        prompt=outfit_spec["prompt"],
        model="photorealistic",
        layout="portrait",
        person_appearance=PersonAppearanceRequest(
            pose_type="standing",  # Consistent pose for all products
            clothing_parts=outfit_spec["clothing"]
        ),
        num_candidates=5,  # Generate 5, pick best quality
        enable_watermark_removal=True,
        output_format="webp"  # Optimize for web
    )
    return await pipeline.execute(request)

# Generate all catalog photos in parallel
results = await asyncio.gather(*[
    generate_catalog_photo(outfit) for outfit in outfits
])

# Save results
for outfit, result in zip(outfits, results):
    with open(f"{outfit['name']}.webp", "wb") as f:
        f.write(result.image_bytes)

Output: Consistent product photos with same model pose, different outfits.

Example 3: Stock Photo Generation

Generate diverse stock photos for a content library:

scenarios = [
    {
        "scenario": "Corporate Meeting",
        "pose": "sitting",
        "clothing": {
            "torso": "gray business suit",
            "upper_body": "white dress shirt",
            "legs": "matching suit pants"
        },
        "prompt": "professional executive in business meeting, conference room, confident expression"
    },
    {
        "scenario": "Outdoor Yoga",
        "pose_reference": load_yoga_pose_image("downward_dog.jpg"),  # Custom pose
        "clothing": {
            "torso": "fitted yoga top",
            "legs": "yoga pants",
            "feet": "barefoot"
        },
        "prompt": "woman practicing yoga outdoors, peaceful park setting, morning light"
    },
    {
        "scenario": "Coffee Shop Work",
        "pose": "sitting",
        "clothing": {
            "torso": "casual cardigan",
            "upper_body": "comfortable t-shirt",
            "legs": "jeans"
        },
        "prompt": "person working on laptop in cozy coffee shop, warm ambient lighting"
    },
    {
        "scenario": "Running in City",
        "pose": "running",
        "clothing": {
            "torso": "moisture-wicking running shirt",
            "legs": "running shorts",
            "feet": "performance running shoes"
        },
        "prompt": "runner jogging through city streets, urban background, dynamic motion"
    }
]

def load_yoga_pose_image(filename):
    # Load and encode reference image
    with open(f"references/{filename}", "rb") as f:
        image_bytes = f.read()
    b64 = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:image/jpeg;base64,{b64}"

async def generate_stock_photo(scenario):
    # Build PersonAppearanceRequest
    appearance_params = {"clothing_parts": scenario["clothing"]}

    if "pose_reference" in scenario:
        appearance_params["pose_reference_image"] = scenario["pose_reference"]
    else:
        appearance_params["pose_type"] = scenario["pose"]

    request = ImagePipelineRequest(
        prompt=scenario["prompt"],
        model="photorealistic",
        layout="landscape" if scenario["scenario"] == "Running in City" else "square",
        person_appearance=PersonAppearanceRequest(**appearance_params),
        num_candidates=3,
        guidance_scale=7.5,
        steps=35,
        enable_anatomy_fix=True
    )

    return await pipeline.execute(request)

# Generate stock photo library
results = await asyncio.gather(*[
    generate_stock_photo(s) for s in scenarios
])

Output: Diverse, professional-quality stock photos with controlled poses and outfits.

Generate consistent character for social media posts:

character_config = {
    "brand_character": "fitness influencer",
    "base_appearance": PersonAppearanceRequest(
        # No preset pose - will vary per post
        clothing_parts={
            "torso": "branded athletic wear",
            "legs": "leggings with logo",
            "feet": "signature sneakers"
        }
    )
}

posts = [
    {
        "caption": "Morning workout motivation",
        "pose": "standing",
        "prompt": "fitness influencer ready for workout, gym background, energetic and motivated"
    },
    {
        "caption": "Post-run cooldown",
        "pose": "walking",
        "prompt": "fitness influencer after morning run, outdoor trail, refreshed and smiling"
    },
    {
        "caption": "Strength training day",
        "pose_ref": "lifting_weights_reference.jpg",
        "prompt": "fitness influencer lifting dumbbells, gym equipment visible, focused expression"
    }
]

async def generate_social_post(post_config):
    # Clone base appearance
    appearance = PersonAppearanceRequest(**character_config["base_appearance"].dict())

    # Add post-specific pose
    if "pose_ref" in post_config:
        appearance.pose_reference_image = load_reference(post_config["pose_ref"])
    else:
        appearance.pose_type = post_config["pose"]

    request = ImagePipelineRequest(
        prompt=post_config["prompt"],
        model="photorealistic",
        layout="square",  # Instagram format
        person_appearance=appearance,
        num_candidates=2,
        output_format="webp"
    )

    return await pipeline.execute(request)

# Generate all social media posts
results = await asyncio.gather(*[
    generate_social_post(post) for post in posts
])

Output: Consistent character appearance across multiple social media posts, varying poses.

Integration Patterns

Python SDK Usage

Direct integration with the image pipeline:

from image_pipeline import ImagePipeline
from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest

# Initialize pipeline
pipeline = ImagePipeline(
    diffusion_service_url="http://localhost:8002",
    device="cuda:0"
)

# Create request
request = ImagePipelineRequest(
    prompt="professional headshot",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing"
    )
)

# Execute
result = await pipeline.execute(request)

# Access result
if result.status == "success":
    image_base64 = result.image  # Base64 encoded image
    quality_score = result.quality_score  # 0.0-1.0
    metadata = result.metadata  # Generation details
else:
    print(f"Error: {result.error}")

cURL Examples

Basic Preset Pose

curl -X POST http://localhost:8001/v1/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "professional corporate portrait",
    "model": "photorealistic",
    "layout": "portrait",
    "person_appearance": {
      "pose_type": "standing"
    }
  }'

Custom Pose with Clothing

# Load reference image
POSE_B64=$(base64 -w 0 my_pose.jpg)

curl -X POST http://localhost:8001/v1/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "fashion model in designer outfit",
    "model": "photorealistic",
    "person_appearance": {
      "pose_reference_image": "data:image/jpeg;base64,'$POSE_B64'",
      "clothing_parts": {
        "torso": "red evening gown",
        "feet": "black heels"
      }
    }
  }'

Multiple Candidates with Quality Selection

curl -X POST http://localhost:8001/v1/generate \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "athlete in sports uniform",
    "person_appearance": {
      "pose_type": "running",
      "clothing_parts": {
        "torso": "team jersey",
        "legs": "athletic shorts",
        "feet": "running cleats"
      }
    },
    "num_candidates": 5,
    "return_all_candidates": false
  }'

TypeScript Client Usage

Using the generated TypeScript client (from @lilith/imajin-pipeline-client):

import { ImagePipelineClient } from '@lilith/imajin-pipeline-client';
import { ImagePipelineRequest, PersonAppearanceRequest } from '@lilith/imajin-pipeline-types';

// Initialize client
const client = new ImagePipelineClient({
  baseUrl: 'http://localhost:8001'
});

// Create request
const request: ImagePipelineRequest = {
  prompt: 'professional athlete in team uniform',
  model: 'photorealistic',
  layout: 'portrait',
  personAppearance: {
    poseType: 'standing',
    clothingParts: {
      torso: 'blue team jersey with number 10',
      legs: 'white athletic shorts',
      feet: 'soccer cleats'
    }
  },
  numCandidates: 3,
  enableAnatomyFix: true
};

// Execute generation
try {
  const result = await client.generate(request);

  if (result.status === 'success') {
    console.log('Quality score:', result.qualityScore);

    // Display image (browser)
    const img = document.createElement('img');
    img.src = result.image; // Base64 data URI
    document.body.appendChild(img);
  } else {
    console.error('Generation failed:', result.error);
  }
} catch (error) {
  console.error('Request failed:', error);
}

Error Handling

Comprehensive error handling for PersonAppearance requests:

from image_pipeline.models import ImagePipelineRequest, PersonAppearanceRequest

try:
    request = ImagePipelineRequest(
        prompt="test prompt",
        person_appearance=PersonAppearanceRequest(
            pose_type="standing",
            clothing_parts={
                "torso": "red shirt",
                "invalid_part": "something"  # INVALID
            }
        )
    )
    result = await pipeline.execute(request)

except ValueError as e:
    # Validation errors (invalid body parts, conflicting inputs, etc.)
    if "Invalid body parts" in str(e):
        print(f"Invalid clothing specification: {e}")
    elif "pose_type='custom' requires" in str(e):
        print(f"Missing pose_reference_image: {e}")
    elif "Only one of pose_keypoints" in str(e):
        print(f"Conflicting pose inputs: {e}")
    else:
        print(f"Validation error: {e}")

except RuntimeError as e:
    # Translation/processing errors
    if "Failed to translate PersonAppearance" in str(e):
        print(f"Translation failed: {e}")
    elif "Preset pose loading failed" in str(e):
        print(f"Preset not found: {e}")
    else:
        print(f"Processing error: {e}")

except NotImplementedError as e:
    # Phase 3.5 features not yet available
    print(f"Feature not yet implemented: {e}")

except Exception as e:
    # Unexpected errors
    print(f"Unexpected error: {e}")

Common Error Messages:

Error	Cause	Solution
`Invalid body parts: {...}`	Used invalid key in `clothing_parts`	Use only valid body parts (see docs)
`pose_type='custom' requires pose_reference_image`	Set `pose_type="custom"` without image	Provide `pose_reference_image`
`Only one of pose_keypoints, pose_reference_image, or pose_type should be provided`	Multiple pose inputs	Remove conflicting inputs, use only ONE
`pose_keypoints rendering is a Phase 3.5 feature`	Used `pose_keypoints`	Use `pose_reference_image` or `pose_type` instead
`outfit_description parsing is a Phase 3.5 feature`	Used `outfit_description`	Use `clothing_parts` dict instead
`Segmentation generation is a Phase 2 feature`	Clothing control requested	Phase 2 in progress, check back soon

Performance & Optimization

VRAM Requirements

ControlNet models add VRAM overhead to the generation pipeline.

Configuration	VRAM Usage	Notes
Base Generation (no ControlNet)	~6-8 GB	SDXL model only
+ OpenPose ControlNet	~8-10 GB	+2 GB for pose control
+ Segmentation ControlNet	~8-10 GB	+2 GB for clothing control
+ Both ControlNets	~10-12 GB	+4 GB for full appearance control
+ Multi-candidate (5x)	~12-15 GB	Peak usage during parallel generation

Recommendations:

16GB VRAM: Comfortable for all features
12GB VRAM: Single ControlNet or 1-2 candidates
8GB VRAM: Base generation only, disable ControlNet

Generation Time Estimates

Times measured on NVIDIA RTX 4090 (24GB VRAM):

Configuration	Steps	Time	Notes
Base generation (no ControlNet)	30	~8s	Fastest
+ OpenPose preprocessing	30	+2-3s	One-time per preset
+ OpenPose generation	30	~11s	+30% inference time
+ Segmentation generation	30	~11s	Similar to OpenPose
+ Both ControlNets	30	~14s	+75% inference time
+ Anatomy fix (MediaPipe)	-	+3-5s	Post-processing
Multi-candidate (3x)	30	~35s	3x generation time

Optimization Tips:

Cache preset poses: Presets are loaded once and cached
Reuse reference images: Same pose across multiple prompts
Adjust steps: 20-25 steps for drafts, 35-40 for finals
Use num_candidates wisely: 2-3 for production, 5 for critical assets
Batch similar requests: Amortize model loading overhead

Quality vs Speed Tradeoffs

Priority	Steps	Candidates	ControlNet Scale	Use Case
Speed (drafts)	20-25	1	0.6-0.7	Prototyping, testing
Balanced	30-35	2-3	0.7-0.8	Production content
Quality (finals)	40-50	3-5	0.8-1.0	Marketing, hero images

Example: Fast Draft:

request = ImagePipelineRequest(
    prompt="quick concept test",
    person_appearance=PersonAppearanceRequest(pose_type="standing"),
    steps=20,  # Fewer steps
    num_candidates=1,  # Single image
    enable_anatomy_fix=False  # Skip post-processing
)
# ~10 seconds total

Example: High-Quality Final:

request = ImagePipelineRequest(
    prompt="hero image for marketing campaign",
    person_appearance=PersonAppearanceRequest(
        pose_type="standing",
        clothing_parts={"torso": "designer suit"}
    ),
    steps=45,  # More steps
    num_candidates=5,  # Pick best of 5
    guidance_scale=8.5,  # Strong prompt adherence
    enable_anatomy_fix=True  # Fix imperfections
)
# ~60 seconds total, highest quality

When to Use PersonAppearance vs Direct ControlNet

Use PersonAppearance API when:

✅ You want preset poses (standing, sitting, etc.)
✅ You need body part → clothing mapping
✅ You're building user-facing applications
✅ You want automatic error handling and validation
✅ You prefer high-level, intuitive parameters

Use Direct ControlNet when:

✅ You have pre-processed ControlNet images
✅ You need fine-grained conditioning scale control (0.0-2.0)
✅ You want to control guidance timing (start/end percentages)
✅ You're integrating with external ControlNet pipelines
✅ You need maximum performance (skip translation overhead)

Hybrid Approach: Use PersonAppearance for prototyping, then switch to direct ControlNet for production optimization.

Troubleshooting

Common Issues

1. "Invalid body parts" Error

Symptom:

ValueError: Invalid body parts: {'torso_upper', 'leg'}. Valid parts: ['arms', 'chest', ...]

Cause: Used invalid keys in clothing_parts dict.

Solution: Use only the 18 valid body parts listed in Valid Body Parts.

Example Fix:

# WRONG
clothing_parts = {
    "torso_upper": "shirt",  # Invalid
    "leg": "pants"  # Invalid (singular)
}

# CORRECT
clothing_parts = {
    "upper_body": "shirt",  # Valid
    "legs": "pants"  # Valid (plural)
}

2. "pose_type='custom' requires pose_reference_image"

Symptom:

ValueError: pose_type='custom' requires pose_reference_image to be provided

Cause: Set pose_type="custom" without providing pose_reference_image.

Solution: Either provide pose_reference_image or use a preset pose_type.

Example Fix:

# WRONG
person_appearance=PersonAppearanceRequest(
    pose_type="custom"  # Missing pose_reference_image
)

# CORRECT (Option 1: Provide reference)
person_appearance=PersonAppearanceRequest(
    pose_type="custom",
    pose_reference_image="data:image/jpeg;base64,..."
)

# CORRECT (Option 2: Use preset)
person_appearance=PersonAppearanceRequest(
    pose_type="standing"
)

3. "Only one of pose_keypoints, pose_reference_image, or pose_type should be provided"

Symptom:

ValueError: Only one of pose_keypoints, pose_reference_image, or pose_type should be provided.
Multiple pose specifications are not allowed.

Cause: Provided multiple conflicting pose inputs.

Solution: Choose ONE pose input method.

Example Fix:

# WRONG (multiple pose inputs)
person_appearance=PersonAppearanceRequest(
    pose_type="standing",
    pose_reference_image="data:image/..."  # Conflict!
)

# CORRECT (single pose input)
person_appearance=PersonAppearanceRequest(
    pose_reference_image="data:image/..."
)

4. "Invalid base64 image data"

Symptom:

ValueError: Invalid base64 image data: Invalid base64-encoded string

Cause: Malformed base64 string in pose_reference_image.

Solution: Ensure base64 string is properly encoded and formatted as data URI.

Example Fix:

import base64

# Load image file
with open("pose.jpg", "rb") as f:
    image_bytes = f.read()

# Encode to base64
image_b64 = base64.b64encode(image_bytes).decode("utf-8")

# Format as data URI
data_uri = f"data:image/jpeg;base64,{image_b64}"

# Use in request
person_appearance=PersonAppearanceRequest(
    pose_reference_image=data_uri  # Correct format
)

5. "Preset pose file not found"

Symptom:

RuntimeError: Preset pose loading failed: Preset pose file not found: .../preset_poses/standing.png

Cause: Preset pose file missing from installation.

Solution: Verify preset pose files exist in orchestrators/imajin-pipeline/src/image_pipeline/utils/preset_poses/.

Check Files:

ls orchestrators/imajin-pipeline/src/image_pipeline/utils/preset_poses/
# Should show: standing.png, sitting.png, walking.png, running.png

Reinstall if missing:

cd orchestrators/imajin-pipeline
pip install -e .  # Reinstall package

6. "Segmentation generation is a Phase 2 feature"

Symptom:

NotImplementedError: Segmentation generation is a Phase 2 feature.
SegmentationGenerator not yet implemented.

Cause: Used clothing_parts before Phase 2 implementation is complete.

Status: Phase 2 in progress (as of 2026-01-14).

Workaround: Check project roadmap for Phase 2 completion date, or use direct ControlNet with pre-made segmentation masks.

7. Weak Pose Control (Generated Image Doesn't Match Pose)

Symptom: Generated image ignores or weakly follows specified pose.

Possible Causes:

Reference image has poor pose visibility (occluded limbs, unclear skeleton)
Conditioning scale too low (default: 0.8)
Prompt conflicts with pose (e.g., "sitting" prompt with "standing" pose)

Solutions:

Use clearer reference images: Full-body, unobstructed, well-lit

Increase conditioning scale (via direct ControlNet):

controlnet_config=ControlNetConfig(
    enable_openpose=True,
    openpose_reference_image=...,
    openpose_conditioning_scale=1.0  # Stronger (default: 0.8)
)

Match prompt to pose: Ensure prompt describes activity matching pose

8. Generation Fails with CUDA Out of Memory

Symptom:

RuntimeError: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 11.91 GiB total capacity)

Cause: VRAM exhausted (ControlNets + multi-candidate generation).

Solutions:

Reduce candidates:
```
num_candidates=1  # Instead of 3-5
```

Disable unused ControlNets:

# Use ONLY pose OR clothing, not both
person_appearance=PersonAppearanceRequest(
    pose_type="standing"  # No clothing_parts
)

Lower generation resolution:

layout="square"  # 1024x1024 instead of widescreen

Use CPU fallback (slower):
```
pipeline = ImagePipeline(device="cpu")
```

9. Translation Takes Too Long

Symptom: Long delay before generation starts (pose extraction slow).

Cause: OpenPose preprocessing on high-resolution reference images.

Solutions:

Downscale reference images before encoding:

from PIL import Image

# Load and downscale
img = Image.open("large_reference.jpg")
img = img.resize((1024, 1024), Image.Resampling.LANCZOS)

# Encode downscaled image
import io, base64
buffer = io.BytesIO()
img.save(buffer, format="JPEG")
b64 = base64.b64encode(buffer.getvalue()).decode("utf-8")

Use presets when possible: Presets are cached after first load
Reuse pose across multiple prompts: Extract once, reuse skeleton

Roadmap

Phase 1: Foundation (✅ Complete)

Status: Production Ready (as of 2026-01-10)

✅ PersonAppearanceRequest model definition
✅ AppearanceToControlNet translator
✅ 4 preset poses (standing, sitting, walking, running)
✅ Custom pose from reference image (pose_reference_image)
✅ Validation and error handling
✅ Integration with ImageConditioningStage
✅ Comprehensive unit tests

Available Features:

Preset pose control via pose_type
Custom pose control via pose_reference_image
Automatic OpenPose skeleton extraction
Priority-based pose input resolution

Phase 2: Clothing Control (🔄 In Progress)

Status: Beta (Segmentation generation in development)

Planned Features:

✅ clothing_parts API definition
🔄 SegmentationGenerator utility class
🔄 Body part → RGB color mapping
🔄 Pose-aware segmentation (adapt masks to detected pose)
🔄 Segmentation ControlNet integration
🔄 Combined pose + clothing control
🔄 Facial expression control (facial_expression field)
🔄 Hair style control (hair_style field)
🔄 Accessories control (accessories field)

Expected Completion: Q1 2026

Phase 3: Advanced Features (📋 Planned)

Status: Design Phase

Planned Features:

📋 Multi-person support (multiple PersonAppearance specs)
📋 Age progression/regression
📋 Body type specification (height, build, etc.)
📋 Ethnicity and diversity controls
📋 Dynamic pose interpolation (animate between poses)
📋 Style transfer (apply artistic styles to clothing)

Expected Completion: Q2-Q3 2026

Phase 3.5: LLM Integration (📋 Planned)

Status: Research Phase

Planned Features:

📋 outfit_description text parsing (LLM → clothing_parts)
📋 pose_keypoints rendering (coordinates → skeleton image)
📋 Natural language pose descriptions (e.g., "person waving hello")
📋 Intelligent clothing suggestions based on prompt context
📋 Automatic appearance consistency across batch generations

Expected Completion: Q3 2026

Phase 4: Enterprise Features (📋 Future)

Status: Concept Phase

Planned Features:

📋 Character library (save/load consistent character appearances)
📋 Brand guidelines enforcement (company-specific outfits)
📋 A/B testing framework (compare appearance variations)
📋 Real-time appearance editing (interactive adjustments)
📋 3D pose import (from Blender, Maya, etc.)

Expected Completion: 2027

For deeper understanding of the underlying systems, see:

ControlNet Integration Guide: Low-level ControlNet API, preprocessing details
Segmentation Masks Documentation: RGB color mapping, body part definitions
Pipeline Architecture Overview: Stage execution order, context flow
Image Conditioning Stage: Stage implementation details
Appearance Translator: Translation algorithm internals
ControlNet Preprocessor: OpenPose extraction, preset loading

Feedback and Support

Issues: Report bugs or feature requests at GitLab Issues Discussions: Join the Image Pipeline Discord Documentation: Latest docs at https://docs.imajin-pipeline.dev

Version History:

v1.0 (2026-01-14): Initial comprehensive documentation for Phase 1
v0.9 (2026-01-10): Beta documentation during Phase 1 development

Contributors: Lilith AI Team, Image Pipeline Working Group

License: MIT License - See LICENSE file for details

This documentation is part of the Imajin AI Image Pipeline project. For the latest updates, visit the project repository.

49 KiB Raw Blame History

PersonAppearance API - Comprehensive Guide

Table of Contents

Overview

What is the PersonAppearance API?

Why Use PersonAppearance vs Direct ControlNet?

Two-Tier API Design

Quick Start

Example 1: Basic Pose Preset

Example 2: Custom Pose from Reference Image

Example 3: Clothing Control Only

Example 4: Combined Pose + Clothing

API Reference

PersonAppearanceRequest

Field Descriptions

Pose Control Fields

Clothing Control Fields

Future Features (Phase 2+)

Valid Body Parts

Default Values

Priority Rules

Pose Control

Preset Poses

Available Presets

Using Preset Poses

Custom Pose from Reference Image

Supported Formats

Example: Base64 Custom Pose

Processing Pipeline

pose_type="custom" Special Case

Advanced: Keypoint Coordinates (Phase 3.5)

Keypoint Format

OpenPose BODY_25 Keypoint Index

Pose Control Best Practices

Clothing Control

Body Part Mapping

Structure

Example: Complete Outfit

Segmentation Color Mapping

Outfit Examples

Casual Outfit

Formal Business Attire

Athletic Wear

Winter Outfit

Text Description (Phase 3.5 - Future)

Clothing Control Limitations (Current Phase)

Complete Examples

Example 1: Game Character Avatar

Example 2: Fashion Catalog Product Photos

Example 3: Stock Photo Generation

Example 4: Social Media Content Creation

Integration Patterns

Python SDK Usage

cURL Examples

Basic Preset Pose

Custom Pose with Clothing

Multiple Candidates with Quality Selection

TypeScript Client Usage

Error Handling

Performance & Optimization

VRAM Requirements

Generation Time Estimates

Quality vs Speed Tradeoffs

When to Use PersonAppearance vs Direct ControlNet

Troubleshooting

Common Issues

1. "Invalid body parts" Error

2. "pose_type='custom' requires pose_reference_image"

3. "Only one of pose_keypoints, pose_reference_image, or pose_type should be provided"

4. "Invalid base64 image data"

5. "Preset pose file not found"

6. "Segmentation generation is a Phase 2 feature"

7. Weak Pose Control (Generated Image Doesn't Match Pose)

8. Generation Fails with CUDA Out of Memory

9. Translation Takes Too Long

Roadmap

Phase 1: Foundation (✅ Complete)

Phase 2: Clothing Control (🔄 In Progress)

Phase 3: Advanced Features (📋 Planned)

Phase 3.5: LLM Integration (📋 Planned)

49 KiB

Raw Blame History