History

Claude Code 3ce96dfa2f feat(models): ✨ Update Pydantic validation schemas with new fields for identity data processing Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>		2026-04-03 09:17:58 -07:00
..
service	feat(models): ✨ Update Pydantic validation schemas with new fields for identity data processing	2026-04-03 09:17:58 -07:00
app.manifest.yaml	chore(imajin-identity): 🔧 Update application manifest configuration with new metadata or dependency versions	2026-04-03 09:17:57 -07:00
docker-compose.yml	chore(config): 🔧 📁 Update 16 JSON files in config	2026-01-17 12:02:23 -08:00
README.md	chore(service): 🛠 Update 9 py files in service	2026-01-17 12:02:24 -08:00

README.md

imajin-identity

Identity recognition service for photo organization. Extracts face embeddings using InsightFace, clusters them using HDBSCAN, and organizes photos by person.

Features

Face Embedding Extraction: 512-dimensional embeddings via InsightFace buffalo_l
Identity Clustering: HDBSCAN clustering for automatic identity grouping
Photo Organization: Separate photos by person with multi-person handling
GPU Coordination: Integrated with model-boss for VRAM management
CLI + API: Both command-line and REST API interfaces

Installation

cd service
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Dependencies

InsightFace: Face detection and embedding extraction
HDBSCAN: Density-based clustering
model-boss: GPU/VRAM coordination (optional but recommended)

CLI Usage

Organize Photos by Person

# Basic usage - organizes photos into person folders
imajin-identity organize ~/Photos/shoot --output ~/Photos/sorted

# Multi-person photos get copied to each person's folder (default)
imajin-identity organize ~/Photos/shoot -o ~/Photos/sorted --multi-person copy

# Use symlinks instead of copying multi-person photos
imajin-identity organize ~/Photos/shoot -o ~/Photos/sorted --multi-person symlink

# Dry run - see what would happen without copying
imajin-identity organize ~/Photos/shoot --dry-run

# Recursive search through subdirectories
imajin-identity organize ~/Photos/shoot -r -o ~/Photos/sorted

Analyze Without Organizing

# Just cluster and report identities
imajin-identity cluster ~/Photos/shoot

# Save results to JSON
imajin-identity cluster ~/Photos/shoot --output results.json

# Require at least 3 photos for a person cluster
imajin-identity cluster ~/Photos/shoot --min-cluster-size 3

Extract Single Image Embedding

# Extract face embedding from single image
imajin-identity embed ~/Photos/photo.jpg

# Extract all faces (not just largest)
imajin-identity embed ~/Photos/group.jpg --all

# Save embeddings to file
imajin-identity embed ~/Photos/photo.jpg --output face.json --include-embedding

Output Structure

output-sorted/
├── person_001/           # Photos of person 1
│   ├── photo1.jpg
│   ├── photo2.jpg
│   └── group_photo.jpg   # Also in person_002 if multi-person=copy
├── person_002/           # Photos of person 2
│   ├── photo3.jpg
│   └── group_photo.jpg
├── person_003/           # Photos of person 3
├── multiple_people/      # Reference copies of multi-person photos
├── no_face_detected/     # Photos where no faces were found
└── organization_report.json

API Usage

Start Service

# Using uvicorn directly
uvicorn imajin_identity.api.app:app --host 0.0.0.0 --port 8005

# Or using the server module
python -m imajin_identity.api.server

Endpoints

GET /health - Service health check
POST /embed/ - Extract face embedding from single image
POST /embed/batch - Extract embeddings from multiple images
POST /cluster/ - Cluster faces from a directory
POST /organize/ - Organize photos by identity

Example Requests

# Health check
curl http://localhost:8005/health

# Extract embedding
curl -X POST http://localhost:8005/embed/ \
  -H "Content-Type: application/json" \
  -d '{"image_path": "/path/to/photo.jpg", "extract_all": true}'

# Cluster faces
curl -X POST http://localhost:8005/cluster/ \
  -H "Content-Type: application/json" \
  -d '{"input_dir": "/path/to/photos", "min_cluster_size": 2}'

# Organize photos
curl -X POST http://localhost:8005/organize/ \
  -H "Content-Type: application/json" \
  -d '{
    "input_dir": "/path/to/photos",
    "output_dir": "/path/to/sorted",
    "multi_person_strategy": "copy"
  }'

Programmatic Usage

import asyncio
from imajin_identity.detection import FaceEmbedder, FaceClusterer

async def main():
    # Extract embeddings
    async with FaceEmbedder() as embedder:
        # Single image
        embedding = await embedder.extract(Path("photo.jpg"))

        # All faces in image
        faces = await embedder.extract_all_faces(Path("group.jpg"))

        # Batch process directory
        photos = await embedder.extract_from_directory(Path("photos/"))

    # Cluster by identity
    clusterer = FaceClusterer(min_cluster_size=2)
    result = clusterer.cluster(photos)

    print(f"Found {result.identity_count} distinct people")
    for cluster in result.clusters:
        print(f"  Person {cluster.cluster_id}: {cluster.photo_count} photos")

asyncio.run(main())

Configuration

Environment Variables

Variable	Default	Description
`REDIS_HOST`	`localhost`	Redis host for model-boss
`REDIS_PORT`	`6379`	Redis port
`INSIGHTFACE_MODEL`	`buffalo_l`	InsightFace model name
`SIMILARITY_THRESHOLD`	`0.68`	Face matching threshold

Clustering Parameters

Parameter	Default	Description
`min_cluster_size`	2	Minimum photos to form a person cluster
`min_samples`	1	HDBSCAN min_samples parameter
`merge_threshold`	0.75	Similarity to merge similar clusters

Testing

# Run unit tests
pytest tests/

# Run with coverage
pytest tests/ --cov=imajin_identity

# Run integration tests (requires GPU)
pytest -m integration tests/

Architecture

imajin_identity/
├── detection/
│   ├── face_embedder.py    # InsightFace + model-boss integration
│   └── face_clusterer.py   # HDBSCAN clustering
├── models/
│   ├── identity.py         # Domain models
│   └── schemas.py          # API schemas
├── api/
│   ├── app.py              # FastAPI application
│   └── routes/             # API endpoints
├── cli/
│   ├── main.py             # CLI entry point
│   ├── organize.py         # organize command
│   ├── cluster.py          # cluster command
│   └── embed.py            # embed command
└── storage/                # Phase 3: MinIO + Redis persistence

GPU Requirements

VRAM: ~2GB for InsightFace buffalo_l model
CUDA: 11.8+ recommended
model-boss: Optional but recommended for multi-service environments

Roadmap

Phase 1: Face embedding & clustering (MVP)
Phase 2: Partial body handling (headless photos)
Phase 3: MinIO storage + Redis persistence
Phase 4: Web UI for identity management