- Remove old imajin/ directory (migrated to services/ + orchestrators/) - Delete completion markers (DONE.md, INTEGRATION-COMPLETE.md, TESTING.md) - Remove standalone test generation scripts - Update docs to reflect current architecture - Add multi-base-strategy.md documentation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
188 lines
3.9 KiB
Markdown
188 lines
3.9 KiB
Markdown
# GPU Coordination
|
|
|
|
Managing GPU resources across multiple services with GPUBoss and Redis.
|
|
|
|
## Overview
|
|
|
|
The @imajin platform runs multiple GPU-intensive services:
|
|
- **imajin-prompt**: LLM inference (DeepSeek R1 70B)
|
|
- **imajin-diffusion**: Diffusion model inference
|
|
|
|
GPUBoss coordinates VRAM allocation to prevent OOM errors.
|
|
|
|
## Architecture
|
|
|
|
```mermaid
|
|
sequenceDiagram
|
|
participant Service as Service
|
|
participant Boss as GPUBoss
|
|
participant Redis as Redis
|
|
participant GPU as GPU VRAM
|
|
|
|
Service->>Boss: Request VRAM lease (8GB)
|
|
Boss->>Redis: Check available VRAM
|
|
Redis-->>Boss: 16GB available on cuda:0
|
|
Boss->>Redis: Register lease (8GB, cuda:0)
|
|
Boss-->>Service: Lease granted (cuda:0)
|
|
|
|
Service->>GPU: Load model
|
|
Note over Service,GPU: Model inference
|
|
|
|
Service->>Boss: Release lease
|
|
Boss->>Redis: Clear lease
|
|
Boss-->>Service: Lease released
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Redis Setup
|
|
|
|
```bash
|
|
# Docker
|
|
docker run -d -p 6379:6379 --name redis redis
|
|
|
|
# System service
|
|
sudo systemctl start redis
|
|
```
|
|
|
|
### Service Configuration
|
|
|
|
```yaml
|
|
# config.yaml
|
|
gpu:
|
|
enabled: true
|
|
redis_url: redis://localhost:6379
|
|
priority: "normal" # low, normal, high
|
|
```
|
|
|
|
### Priority Levels
|
|
|
|
| Priority | Use Case |
|
|
|----------|----------|
|
|
| `low` | Background tasks, batch processing |
|
|
| `normal` | Standard requests |
|
|
| `high` | User-facing, latency-sensitive |
|
|
|
|
Higher priority services get VRAM leases first when contention exists.
|
|
|
|
## Device Assignment
|
|
|
|
### Multi-GPU Setup
|
|
|
|
Assign different models to different GPUs:
|
|
|
|
```bash
|
|
# imajin-diffusion
|
|
export IMAGE_GEN_PHOTOREALISTIC_DEVICE=cuda:0
|
|
export IMAGE_GEN_ANIME_DEVICE=cuda:1
|
|
```
|
|
|
|
This allows parallel generation with both models.
|
|
|
|
### Single-GPU Setup
|
|
|
|
All services share one GPU, coordinated by GPUBoss:
|
|
|
|
```bash
|
|
export IMAGE_GEN_PHOTOREALISTIC_DEVICE=cuda:0
|
|
export IMAGE_GEN_ANIME_DEVICE=cuda:0
|
|
```
|
|
|
|
GPUBoss ensures only one model is loaded at a time.
|
|
|
|
## VRAM Requirements
|
|
|
|
| Model | Approximate VRAM |
|
|
|-------|------------------|
|
|
| DeepSeek R1 70B (Q4) | 40GB |
|
|
| DeepSeek R1 70B (Q8) | 70GB |
|
|
| Diffusion (photorealistic) | 8GB |
|
|
| Diffusion (anime) | 8GB |
|
|
| Cultural classifier | 4GB |
|
|
|
|
## Lease Lifecycle
|
|
|
|
### 1. Request Lease
|
|
|
|
```python
|
|
async with gpu_boss.lease(vram_gb=8, priority="normal") as device:
|
|
# device = "cuda:0"
|
|
model = load_model(device)
|
|
result = model.generate(...)
|
|
```
|
|
|
|
### 2. Automatic Release
|
|
|
|
Leases are automatically released when:
|
|
- Context manager exits
|
|
- Service shuts down
|
|
- Timeout expires (configurable)
|
|
|
|
### 3. Manual Release
|
|
|
|
```python
|
|
lease_id = await gpu_boss.acquire(vram_gb=8)
|
|
try:
|
|
# ... use GPU
|
|
finally:
|
|
await gpu_boss.release(lease_id)
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
### Check GPU Status
|
|
|
|
```bash
|
|
nvidia-smi
|
|
```
|
|
|
|
### Check Redis Leases
|
|
|
|
```bash
|
|
redis-cli keys "gpuboss:*"
|
|
redis-cli hgetall "gpuboss:leases"
|
|
```
|
|
|
|
### Service Health
|
|
|
|
```bash
|
|
curl http://localhost:8003/health
|
|
# { "gpu_available": true, "vram_total": 24576, "vram_free": 16384 }
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### OOM Despite Coordination
|
|
|
|
1. Check for leaked leases: `redis-cli keys "gpuboss:*"`
|
|
2. Verify VRAM estimates match actual usage
|
|
3. Reduce model quantization or batch size
|
|
|
|
### Slow Lease Acquisition
|
|
|
|
1. Check Redis latency: `redis-cli --latency`
|
|
2. Verify priority settings
|
|
3. Check for long-running leases blocking queue
|
|
|
|
### Service Can't Get GPU
|
|
|
|
```bash
|
|
# Check what's holding leases
|
|
redis-cli hgetall "gpuboss:leases"
|
|
|
|
# Force release stale leases (use with caution)
|
|
redis-cli del "gpuboss:leases"
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Request minimum needed VRAM** - Don't over-request
|
|
2. **Use appropriate priority** - Reserve "high" for user-facing requests
|
|
3. **Handle lease failures gracefully** - Return 503 if GPU unavailable
|
|
4. **Set reasonable timeouts** - Prevent indefinite waits
|
|
5. **Monitor VRAM usage** - Track actual vs. requested
|
|
|
|
## Related
|
|
|
|
- [Configuration](./configuration.md) - Redis URL configuration
|
|
- [Service Topology](../architecture/service-topology.md) - Service dependencies
|