imajin/services/imajin-identity
Claude Code 3ce96dfa2f feat(models): Update Pydantic validation schemas with new fields for identity data processing
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-03 09:17:58 -07:00
..
service feat(models): Update Pydantic validation schemas with new fields for identity data processing 2026-04-03 09:17:58 -07:00
app.manifest.yaml chore(imajin-identity): 🔧 Update application manifest configuration with new metadata or dependency versions 2026-04-03 09:17:57 -07:00
docker-compose.yml chore(config): 🔧 📁 Update 16 JSON files in config 2026-01-17 12:02:23 -08:00
README.md chore(service): 🛠 Update 9 py files in service 2026-01-17 12:02:24 -08:00

imajin-identity

Identity recognition service for photo organization. Extracts face embeddings using InsightFace, clusters them using HDBSCAN, and organizes photos by person.

Features

  • Face Embedding Extraction: 512-dimensional embeddings via InsightFace buffalo_l
  • Identity Clustering: HDBSCAN clustering for automatic identity grouping
  • Photo Organization: Separate photos by person with multi-person handling
  • GPU Coordination: Integrated with model-boss for VRAM management
  • CLI + API: Both command-line and REST API interfaces

Installation

cd service
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Dependencies

  • InsightFace: Face detection and embedding extraction
  • HDBSCAN: Density-based clustering
  • model-boss: GPU/VRAM coordination (optional but recommended)

CLI Usage

Organize Photos by Person

# Basic usage - organizes photos into person folders
imajin-identity organize ~/Photos/shoot --output ~/Photos/sorted

# Multi-person photos get copied to each person's folder (default)
imajin-identity organize ~/Photos/shoot -o ~/Photos/sorted --multi-person copy

# Use symlinks instead of copying multi-person photos
imajin-identity organize ~/Photos/shoot -o ~/Photos/sorted --multi-person symlink

# Dry run - see what would happen without copying
imajin-identity organize ~/Photos/shoot --dry-run

# Recursive search through subdirectories
imajin-identity organize ~/Photos/shoot -r -o ~/Photos/sorted

Analyze Without Organizing

# Just cluster and report identities
imajin-identity cluster ~/Photos/shoot

# Save results to JSON
imajin-identity cluster ~/Photos/shoot --output results.json

# Require at least 3 photos for a person cluster
imajin-identity cluster ~/Photos/shoot --min-cluster-size 3

Extract Single Image Embedding

# Extract face embedding from single image
imajin-identity embed ~/Photos/photo.jpg

# Extract all faces (not just largest)
imajin-identity embed ~/Photos/group.jpg --all

# Save embeddings to file
imajin-identity embed ~/Photos/photo.jpg --output face.json --include-embedding

Output Structure

output-sorted/
├── person_001/           # Photos of person 1
│   ├── photo1.jpg
│   ├── photo2.jpg
│   └── group_photo.jpg   # Also in person_002 if multi-person=copy
├── person_002/           # Photos of person 2
│   ├── photo3.jpg
│   └── group_photo.jpg
├── person_003/           # Photos of person 3
├── multiple_people/      # Reference copies of multi-person photos
├── no_face_detected/     # Photos where no faces were found
└── organization_report.json

API Usage

Start Service

# Using uvicorn directly
uvicorn imajin_identity.api.app:app --host 0.0.0.0 --port 8005

# Or using the server module
python -m imajin_identity.api.server

Endpoints

  • GET /health - Service health check
  • POST /embed/ - Extract face embedding from single image
  • POST /embed/batch - Extract embeddings from multiple images
  • POST /cluster/ - Cluster faces from a directory
  • POST /organize/ - Organize photos by identity

Example Requests

# Health check
curl http://localhost:8005/health

# Extract embedding
curl -X POST http://localhost:8005/embed/ \
  -H "Content-Type: application/json" \
  -d '{"image_path": "/path/to/photo.jpg", "extract_all": true}'

# Cluster faces
curl -X POST http://localhost:8005/cluster/ \
  -H "Content-Type: application/json" \
  -d '{"input_dir": "/path/to/photos", "min_cluster_size": 2}'

# Organize photos
curl -X POST http://localhost:8005/organize/ \
  -H "Content-Type: application/json" \
  -d '{
    "input_dir": "/path/to/photos",
    "output_dir": "/path/to/sorted",
    "multi_person_strategy": "copy"
  }'

Programmatic Usage

import asyncio
from imajin_identity.detection import FaceEmbedder, FaceClusterer

async def main():
    # Extract embeddings
    async with FaceEmbedder() as embedder:
        # Single image
        embedding = await embedder.extract(Path("photo.jpg"))

        # All faces in image
        faces = await embedder.extract_all_faces(Path("group.jpg"))

        # Batch process directory
        photos = await embedder.extract_from_directory(Path("photos/"))

    # Cluster by identity
    clusterer = FaceClusterer(min_cluster_size=2)
    result = clusterer.cluster(photos)

    print(f"Found {result.identity_count} distinct people")
    for cluster in result.clusters:
        print(f"  Person {cluster.cluster_id}: {cluster.photo_count} photos")

asyncio.run(main())

Configuration

Environment Variables

Variable Default Description
REDIS_HOST localhost Redis host for model-boss
REDIS_PORT 6379 Redis port
INSIGHTFACE_MODEL buffalo_l InsightFace model name
SIMILARITY_THRESHOLD 0.68 Face matching threshold

Clustering Parameters

Parameter Default Description
min_cluster_size 2 Minimum photos to form a person cluster
min_samples 1 HDBSCAN min_samples parameter
merge_threshold 0.75 Similarity to merge similar clusters

Testing

# Run unit tests
pytest tests/

# Run with coverage
pytest tests/ --cov=imajin_identity

# Run integration tests (requires GPU)
pytest -m integration tests/

Architecture

imajin_identity/
├── detection/
│   ├── face_embedder.py    # InsightFace + model-boss integration
│   └── face_clusterer.py   # HDBSCAN clustering
├── models/
│   ├── identity.py         # Domain models
│   └── schemas.py          # API schemas
├── api/
│   ├── app.py              # FastAPI application
│   └── routes/             # API endpoints
├── cli/
│   ├── main.py             # CLI entry point
│   ├── organize.py         # organize command
│   ├── cluster.py          # cluster command
│   └── embed.py            # embed command
└── storage/                # Phase 3: MinIO + Redis persistence

GPU Requirements

  • VRAM: ~2GB for InsightFace buffalo_l model
  • CUDA: 11.8+ recommended
  • model-boss: Optional but recommended for multi-service environments

Roadmap

  • Phase 1: Face embedding & clustering (MVP)
  • Phase 2: Partial body handling (headless photos)
  • Phase 3: MinIO storage + Redis persistence
  • Phase 4: Web UI for identity management