imajin/docs/operations/gpu-coordination.md
Lilith a5f99bb3d7 chore(imajin): clean up legacy structure and completion markers
- Remove old imajin/ directory (migrated to services/ + orchestrators/)
- Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md)
- Remove standalone test generation scripts
- Update docs to reflect current architecture
- Add multi-base-strategy.md documentation

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-16 17:01:10 -08:00

3.9 KiB

GPU Coordination

Managing GPU resources across multiple services with GPUBoss and Redis.

Overview

The @imajin platform runs multiple GPU-intensive services:

  • imajin-prompt: LLM inference (DeepSeek R1 70B)
  • imajin-diffusion: Diffusion model inference

GPUBoss coordinates VRAM allocation to prevent OOM errors.

Architecture

sequenceDiagram
    participant Service as Service
    participant Boss as GPUBoss
    participant Redis as Redis
    participant GPU as GPU VRAM

    Service->>Boss: Request VRAM lease (8GB)
    Boss->>Redis: Check available VRAM
    Redis-->>Boss: 16GB available on cuda:0
    Boss->>Redis: Register lease (8GB, cuda:0)
    Boss-->>Service: Lease granted (cuda:0)

    Service->>GPU: Load model
    Note over Service,GPU: Model inference

    Service->>Boss: Release lease
    Boss->>Redis: Clear lease
    Boss-->>Service: Lease released

Configuration

Redis Setup

# Docker
docker run -d -p 6379:6379 --name redis redis

# System service
sudo systemctl start redis

Service Configuration

# config.yaml
gpu:
  enabled: true
  redis_url: redis://localhost:6379
  priority: "normal"  # low, normal, high

Priority Levels

Priority Use Case
low Background tasks, batch processing
normal Standard requests
high User-facing, latency-sensitive

Higher priority services get VRAM leases first when contention exists.

Device Assignment

Multi-GPU Setup

Assign different models to different GPUs:

# imajin-diffusion
export IMAGE_GEN_PHOTOREALISTIC_DEVICE=cuda:0
export IMAGE_GEN_ANIME_DEVICE=cuda:1

This allows parallel generation with both models.

Single-GPU Setup

All services share one GPU, coordinated by GPUBoss:

export IMAGE_GEN_PHOTOREALISTIC_DEVICE=cuda:0
export IMAGE_GEN_ANIME_DEVICE=cuda:0

GPUBoss ensures only one model is loaded at a time.

VRAM Requirements

Model Approximate VRAM
DeepSeek R1 70B (Q4) 40GB
DeepSeek R1 70B (Q8) 70GB
Diffusion (photorealistic) 8GB
Diffusion (anime) 8GB
Cultural classifier 4GB

Lease Lifecycle

1. Request Lease

async with gpu_boss.lease(vram_gb=8, priority="normal") as device:
    # device = "cuda:0"
    model = load_model(device)
    result = model.generate(...)

2. Automatic Release

Leases are automatically released when:

  • Context manager exits
  • Service shuts down
  • Timeout expires (configurable)

3. Manual Release

lease_id = await gpu_boss.acquire(vram_gb=8)
try:
    # ... use GPU
finally:
    await gpu_boss.release(lease_id)

Monitoring

Check GPU Status

nvidia-smi

Check Redis Leases

redis-cli keys "gpuboss:*"
redis-cli hgetall "gpuboss:leases"

Service Health

curl http://localhost:8003/health
# { "gpu_available": true, "vram_total": 24576, "vram_free": 16384 }

Troubleshooting

OOM Despite Coordination

  1. Check for leaked leases: redis-cli keys "gpuboss:*"
  2. Verify VRAM estimates match actual usage
  3. Reduce model quantization or batch size

Slow Lease Acquisition

  1. Check Redis latency: redis-cli --latency
  2. Verify priority settings
  3. Check for long-running leases blocking queue

Service Can't Get GPU

# Check what's holding leases
redis-cli hgetall "gpuboss:leases"

# Force release stale leases (use with caution)
redis-cli del "gpuboss:leases"

Best Practices

  1. Request minimum needed VRAM - Don't over-request
  2. Use appropriate priority - Reserve "high" for user-facing requests
  3. Handle lease failures gracefully - Return 503 if GPU unavailable
  4. Set reasonable timeouts - Prevent indefinite waits
  5. Monitor VRAM usage - Track actual vs. requested