imajin-identity
Identity recognition service for photo organization. Extracts face embeddings using InsightFace, clusters them using HDBSCAN, and organizes photos by person.
Features
- Face Embedding Extraction: 512-dimensional embeddings via InsightFace buffalo_l
- Identity Clustering: HDBSCAN clustering for automatic identity grouping
- Photo Organization: Separate photos by person with multi-person handling
- GPU Coordination: Integrated with model-boss for VRAM management
- CLI + API: Both command-line and REST API interfaces
Installation
cd service
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Dependencies
- InsightFace: Face detection and embedding extraction
- HDBSCAN: Density-based clustering
- model-boss: GPU/VRAM coordination (optional but recommended)
CLI Usage
Organize Photos by Person
# Basic usage - organizes photos into person folders
imajin-identity organize ~/Photos/shoot --output ~/Photos/sorted
# Multi-person photos get copied to each person's folder (default)
imajin-identity organize ~/Photos/shoot -o ~/Photos/sorted --multi-person copy
# Use symlinks instead of copying multi-person photos
imajin-identity organize ~/Photos/shoot -o ~/Photos/sorted --multi-person symlink
# Dry run - see what would happen without copying
imajin-identity organize ~/Photos/shoot --dry-run
# Recursive search through subdirectories
imajin-identity organize ~/Photos/shoot -r -o ~/Photos/sorted
Analyze Without Organizing
# Just cluster and report identities
imajin-identity cluster ~/Photos/shoot
# Save results to JSON
imajin-identity cluster ~/Photos/shoot --output results.json
# Require at least 3 photos for a person cluster
imajin-identity cluster ~/Photos/shoot --min-cluster-size 3
# Extract face embedding from single image
imajin-identity embed ~/Photos/photo.jpg
# Extract all faces (not just largest)
imajin-identity embed ~/Photos/group.jpg --all
# Save embeddings to file
imajin-identity embed ~/Photos/photo.jpg --output face.json --include-embedding
Output Structure
output-sorted/
├── person_001/ # Photos of person 1
│ ├── photo1.jpg
│ ├── photo2.jpg
│ └── group_photo.jpg # Also in person_002 if multi-person=copy
├── person_002/ # Photos of person 2
│ ├── photo3.jpg
│ └── group_photo.jpg
├── person_003/ # Photos of person 3
├── multiple_people/ # Reference copies of multi-person photos
├── no_face_detected/ # Photos where no faces were found
└── organization_report.json
API Usage
Start Service
# Using uvicorn directly
uvicorn imajin_identity.api.app:app --host 0.0.0.0 --port 8005
# Or using the server module
python -m imajin_identity.api.server
Endpoints
GET /health - Service health check
POST /embed/ - Extract face embedding from single image
POST /embed/batch - Extract embeddings from multiple images
POST /cluster/ - Cluster faces from a directory
POST /organize/ - Organize photos by identity
Example Requests
# Health check
curl http://localhost:8005/health
# Extract embedding
curl -X POST http://localhost:8005/embed/ \
-H "Content-Type: application/json" \
-d '{"image_path": "/path/to/photo.jpg", "extract_all": true}'
# Cluster faces
curl -X POST http://localhost:8005/cluster/ \
-H "Content-Type: application/json" \
-d '{"input_dir": "/path/to/photos", "min_cluster_size": 2}'
# Organize photos
curl -X POST http://localhost:8005/organize/ \
-H "Content-Type: application/json" \
-d '{
"input_dir": "/path/to/photos",
"output_dir": "/path/to/sorted",
"multi_person_strategy": "copy"
}'
Programmatic Usage
import asyncio
from imajin_identity.detection import FaceEmbedder, FaceClusterer
async def main():
# Extract embeddings
async with FaceEmbedder() as embedder:
# Single image
embedding = await embedder.extract(Path("photo.jpg"))
# All faces in image
faces = await embedder.extract_all_faces(Path("group.jpg"))
# Batch process directory
photos = await embedder.extract_from_directory(Path("photos/"))
# Cluster by identity
clusterer = FaceClusterer(min_cluster_size=2)
result = clusterer.cluster(photos)
print(f"Found {result.identity_count} distinct people")
for cluster in result.clusters:
print(f" Person {cluster.cluster_id}: {cluster.photo_count} photos")
asyncio.run(main())
Configuration
Environment Variables
| Variable |
Default |
Description |
REDIS_HOST |
localhost |
Redis host for model-boss |
REDIS_PORT |
6379 |
Redis port |
INSIGHTFACE_MODEL |
buffalo_l |
InsightFace model name |
SIMILARITY_THRESHOLD |
0.68 |
Face matching threshold |
Clustering Parameters
| Parameter |
Default |
Description |
min_cluster_size |
2 |
Minimum photos to form a person cluster |
min_samples |
1 |
HDBSCAN min_samples parameter |
merge_threshold |
0.75 |
Similarity to merge similar clusters |
Testing
# Run unit tests
pytest tests/
# Run with coverage
pytest tests/ --cov=imajin_identity
# Run integration tests (requires GPU)
pytest -m integration tests/
Architecture
imajin_identity/
├── detection/
│ ├── face_embedder.py # InsightFace + model-boss integration
│ └── face_clusterer.py # HDBSCAN clustering
├── models/
│ ├── identity.py # Domain models
│ └── schemas.py # API schemas
├── api/
│ ├── app.py # FastAPI application
│ └── routes/ # API endpoints
├── cli/
│ ├── main.py # CLI entry point
│ ├── organize.py # organize command
│ ├── cluster.py # cluster command
│ └── embed.py # embed command
└── storage/ # Phase 3: MinIO + Redis persistence
GPU Requirements
- VRAM: ~2GB for InsightFace buffalo_l model
- CUDA: 11.8+ recommended
- model-boss: Optional but recommended for multi-service environments
Roadmap