imajin/tooling/ARCHITECTURAL_RULES.md

# Imajin Architectural Rules

**Purpose**: Critical architectural constraints for imajin reasoning pipeline
**Enforcement**: MANDATORY - violations must be caught in code review
**Last Updated**: 2026-01-11

---

## CRITICAL RULE #1: No Static Cultural Term Lists

### Violation Definition

**ANY** hardcoded list, mapping, dictionary, or system prompt example that defines cultural terms and their classifications is a **CRITICAL VIOLATION**.

### Examples of Violations

❌ **Static Term Lists**:
```python
ANIME_TERMS = ["femboy", "kawaii", "catgirl", "neko", "vtuber"]
PHOTOREALISTIC_TERMS = ["professional", "lawyer", "businesswoman"]
```

❌ **Static Term Mappings**:
```python
CULTURAL_TERMS = {
    "femboy": {"style": "anime", "confidence": 0.95},
    "kawaii": {"style": "anime", "confidence": 1.0},
    "milf": {"style": "photorealistic", "confidence": 1.0}
}
```

❌ **System Prompt Examples**:
```python
PROMPT = """
Classify cultural terms:
- anime: waifu, senpai, neko, kawaii, catgirl, femboy, bunny girl
- photorealistic: model, influencer, lawyer, businesswoman
"""
```

❌ **Hardcoded If/Else Rules**:
```python
if term == "femboy":
    return "anime"
if term in ["lawyer", "professional"]:
    return "photorealistic"
```

❌ **"ALWAYS" Rules**:
```python
"""
CRITICAL RULES:
1. Japanese terms (kawaii, neko) → ALWAYS "anime"
2. Femboy → ALWAYS anime aesthetic
"""
```

❌ **Priority Override Logic**:
```python
def determine_style(terms):
    # Anime takes priority (hardcoded rule!)
    if has_anime and confidence >= 0.7:
        return "anime"
```

### Why These Are Violations

1. **Bypasses LLM Reasoning**: Static lists prevent the LLM from using semantic understanding
2. **Encodes Assumptions**: Cultural classifications become hardcoded data, not reasoned conclusions
3. **Prevents Generalization**: LLM can't reason about novel terms or new cultural contexts
4. **Brittle**: Requires manual updates when culture changes or new terms emerge
5. **No Context Awareness**: Individual term mappings can't consider full request context
6. **Defeats Purpose**: The whole point of LLM reasoning is to avoid hardcoded cultural rules

### Correct Approach

✅ **Pure LLM Reasoning**:
```python
# Ask LLM to analyze terms WITHOUT examples
response = await llm.analyze(
    question="What aesthetic style is 'femboy' typically depicted in? Provide reasoning.",
    context={"category": "escorts", "city": "New York"}
)
# LLM reasons: "Femboy is a cultural term from anime communities..."
```

✅ **Context-Aware Analysis**:
```python
# LLM considers FULL context, not individual terms
response = await llm.analyze(
    question="Given filters [femboy, latex] in NYC, which aesthetic dominates and why?",
    weights={"cultural": 0.9, "geographic": 0.1}  # Instance-specific
)
```

✅ **Configurable Reasoning**:
```python
# Weights and priorities come from instance config, not hardcoded
config = seo_config  # SEO: geographic=0.1 (lowest)
result = await reasoning_chain.execute(request, config)
```

✅ **Explicit Chain of Thought**:
```python
# LLM provides visible reasoning for each step
{
  "stage": "term_analysis",
  "question": "Analyze 'femboy'...",
  "response": {"style": "anime", "reasoning": "Cultural term from..."}
}
```

---

## CRITICAL RULE #2: Configuration-Driven Reasoning

### Principle

Imajin is an **abstract, reusable tool**. Configuration must be owned by the **instantiator** (e.g., SEO service), not by imajin itself.

### Correct Architecture

```
┌────────────────────────────────────┐
│      SEO SERVICE (Instantiator)    │
│                                     │
│  Owns:                              │
│    - factorWeights (geographic:0.1) │
│    - extractionGoals (25 aspects)   │
│    - maturityPolicy                 │
│    - modelSelection                 │
└────────────┬────────────────────────┘
             │ Passes config in request
             ▼
┌────────────────────────────────────┐
│    imajin-reasoning (Tool)         │
│                                     │
│  Receives:                          │
│    - instanceConfig per request     │
│    - NO hardcoded defaults          │
│    - NO stored configurations       │
└────────────────────────────────────┘
```

### Instance Configuration Schema

```yaml
instanceConfig:
  # Instance identity
  instanceName: seo-instance

  # Factor weighting (0.0-1.0)
  factorWeights:
    cultural: 0.9        # Highest for SEO
    category: 0.8
    audience_appeal: 0.8
    composition: 0.7
    material: 0.6
    maturity: 0.5
    geographic: 0.1      # LOWEST for SEO (location doesn't define aesthetic)

  # What to extract (25 aspects)
  extractionGoals:
    - style                 # Essential
    - subject_count         # Essential
    - gender_composition    # Essential
    - maturity_level        # Essential (7 levels)
    - target_audience       # NEW: who seeks this
    - audience_expectations # NEW: what they expect
    - power_dynamic         # NEW: dom/sub/neutral
    - aesthetic_tone        # cute, sexy, elegant
    - dominant_mood         # playful, seductive, etc.
    - clothing_style        # fetish_wear, lingerie, etc.
    # ... (25 total)

  # Model selection per stage
  modelSelection:
    defaultModel: ministral-14b-reasoning
    stageOverrides:
      cultural_hierarchy: ministral-14b-reasoning
      validation: ministral-14b-reasoning

  # Maturity constraints
  maturityPolicy:
    allowExplicitContent: true
    defaultMinimum: suggestive
    defaultMaximum: explicit_nude
```

### Request Format

```json
{
  "category": "escorts",
  "city": "New York",
  "filters": ["femboy", "latex"],
  "maturity": {
    "minimumRating": "suggestive",
    "expectedRating": "mature",
    "maximumRating": "explicit_nude"
  },
  "instanceConfig": {
    // SEO passes its entire configuration
    "instanceName": "seo",
    "factorWeights": {...},
    "extractionGoals": [...]
  }
}
```

---

## 7-Level Maturity Taxonomy

Maturity levels from lowest to highest:

```yaml
1. sfw:
   label: "Safe for Work"
   description: "Clothed, family-friendly, no sexual content"
   examples: "Professional headshot, casual clothing, G-rated"

2. suggestive:
   label: "Suggestive"
   description: "Sensual but not explicit - revealing clothing, flirtation"
   examples: "Cleavage, short skirt, seductive pose, implied sensuality"
   intensity: "PG-13 to R-rated imagery"

3. mature:
   label: "Mature"
   description: "Adult themes - lingerie, partial nudity, sexual tension"
   examples: "Visible lingerie, suggestive positioning, intimate setting"
   intensity: "R to NC-17 imagery"

4. explicit_soft:
   label: "Explicit (Artistic Nudity)"
   description: "Tasteful nudity with artistic intent, strategic coverage"
   examples: "Artistic nude photography, implied nudity, covered areas"
   intensity: "Artistic nude, non-pornographic"

5. explicit_nude:
   label: "Explicit (Erotic Nudity)"
   description: "Full nudity with erotic intent, sexual presentation"
   examples: "Full frontal nudity, erotic posing, sexual display"
   intensity: "Pornographic imagery but no sex acts"

6. explicit_sexual:
   label: "Explicit (Sexual Acts)"
   description: "Sexual activity - penetration, oral, intercourse"
   examples: "Penetrative sex, oral sex, explicit sexual acts shown"
   intensity: "Hardcore pornography"

7. extreme:
   label: "Extreme"
   description: "Hardcore fetish, intense BDSM, taboo scenarios"
   examples: "Extreme BDSM, intense fetish content, taboo scenarios"
   intensity: "Most extreme pornographic content"
```

**Note**: The 7-level spectrum allows fine-grained control. Consumers can specify:
- `minimumRating`: Won't go below this level
- `expectedRating`: Target this level
- `maximumRating`: Won't exceed this level

---

## 25 Extraction Goals

All aspects that must be determined through LLM reasoning:

### Essential (5)
1. **style**: anime vs photorealistic
2. **subject_count**: 1, 2, 3+
3. **gender_composition**: [male], [female], [male, female], etc.
4. **maturity_level**: sfw → extreme (7 levels)
5. **client_figure_required**: true/false (GFE scenarios)

### Audience & Demographics (4) - NEW
6. **target_audience**: straight_male, gay_male, lesbian, queer, general
7. **audience_expectations**: What this audience typically seeks
8. **presentation_appeal**: Who finds this presentation attractive
9. **cultural_community**: anime_fans, fetish_community, mainstream

### Power Dynamics (3) - NEW
10. **power_dynamic**: dominant, submissive, switch, neutral
11. **service_provider_role**: active_provider, passive_receiver, versatile
12. **interaction_type**: giving, receiving, mutual

### Aesthetic Details (5)
13. **aesthetic_tone**: cute, sexy, elegant, edgy, playful
14. **dominant_mood**: innocent, seductive, playful, intense
15. **clothing_style**: casual, formal, lingerie, fetish_wear, costume
16. **color_palette**: vibrant, pastel, muted, dark, neon
17. **emotional_expression**: neutral, smiling, seductive, playful

### Composition (4)
18. **pose_type**: portrait, full_body, action, intimate
19. **setting_environment**: indoor, outdoor, bedroom, studio
20. **camera_framing**: portrait, full_body, close_up
21. **background_complexity**: simple, detailed, bokeh

### Style Specificity (4)
22. **cultural_specificity**: japanese_elements, western_modern, mixed
23. **art_style_granularity**: (anime) chibi/shoujo/seinen or (photo) glamour/editorial
24. **lighting_style**: natural, studio, dramatic, soft
25. **body_type_implied**: slender, athletic, curvy, petite

---

## Chain-of-Reasoning Architecture

All classification MUST use multi-stage LLM reasoning with explicit CoT:

### Principles

1. **Question-Based Reasoning**: Frame each analysis as a question to the LLM
2. **Explicit Chain of Thought**: LLM provides visible reasoning for transparency
3. **No Priority Overrides**: LLM decides conflicts using instance weights, not hardcoded rules
4. **Context-Aware**: LLM considers full request context holistically
5. **Configuration-Driven**: Weights and priorities from instance config, not defaults

### Example Reasoning Stages

**Stage 1: Individual Term Analysis**
```
Q: "What aesthetic style is 'femboy' typically depicted in? Provide reasoning."
Response: {"style": "anime", "confidence": 0.95, "reasoning": "Cultural term from anime communities..."}
```

**Stage 2: Term Interaction**
```
Q: "How do 'femboy' and 'latex' interact when combined?"
Response: {"interaction": "femboy defines aesthetic, latex is attribute", "resultingStyle": "anime"}
```

**Stage 3: Weighted Hierarchy**
```
Q: "Given femboy (cultural, 0.95) and NYC (geographic), apply weights: cultural=0.9, geographic=0.1"
Response: {"weightedScores": {"cultural": 0.855, "geographic": 0.07}, "decision": "anime"}
```

**Stage 4: Target Audience**
```
Q: "Who typically finds feminine presentation (femboy) attractive?"
Response: {"primary": "straight_males", "reasoning": "Straight males seek feminine aesthetics..."}
```

**Stage 5: Power Dynamics**
```
Q: "Does 'latex' clothing indicate dominant or submissive role?"
Response: {"powerDynamic": "neutral", "reasoning": "Latex is material, not role. Can be worn by dom or sub."}
```

---

## Violations Found in Current Codebase

### CRITICAL Violations (Must Remove Immediately)

| File | Lines | Violation Type | Impact |
|------|-------|----------------|--------|
| `services/imajin-request-classifier/service/src/cultural_classifier/classifier.py` | 64-89 | System prompt with hardcoded examples | LLM primed with "femboy, kawaii → anime" |
| `services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py` | 15-308 | Static term database (50+ terms) | Completely bypasses LLM with static mappings |
| `services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py` | 334-342 | "ALWAYS" rules in training | Encodes "Japanese terms → ALWAYS anime" |

### MODERATE Violations (Refactor to LLM Reasoning)

| File | Lines | Violation Type | Impact |
|------|-------|----------------|--------|
| `services/imajin-request-classifier/service/src/cultural_classifier/classifier.py` | 223-260 | Priority override logic | "anime takes priority if confidence ≥0.7" (hardcoded threshold) |
| `services/imajin-prompt-generator/service/src/prompts/pipelines.py` | 116 | Category-gender mappings | "gay = two men", "duo = two women" (static rules) |
| `services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py` | 174-235 | Gender composition mappings | Hardcoded gender for categories |

---

## Remediation Plan

### Phase 1: Delete Static Lists (Immediate)
1. Delete `generate_training_data.py` entirely (static term database)
2. Remove CLASSIFIER_SYSTEM_PROMPT examples from `classifier.py`
3. Remove `determine_style()` priority logic
4. Remove category-gender mappings

### Phase 2: Implement LLM Reasoning (Core)
1. Create `imajin-reasoning` service (orchestrator)
2. Refactor `imajin-classifier` to generic Q&A endpoint
3. Implement multi-stage reasoning chain
4. Add explicit CoT response format

### Phase 3: Configuration-Driven Weights
1. Add instance configuration schema
2. Implement weighted hierarchy calculation
3. Add SEO instance example (geographic: 0.1 lowest)
4. Support runtime config passing from instantiator

### Phase 4: Verification
1. Test sample request: escorts NYC femboy+latex → anime (no static lists used)
2. Verify novel terms still work (mecha_pilot, isekai_protagonist)
3. Verify reasoning chain is explicit and traceable
4. Confirm all 14+ cultural correlation tests pass

---

## Enforcement Checklist

Before merging any code that touches cultural classification, verify:

- [ ] NO static term lists in any file
- [ ] NO hardcoded examples in system prompts
- [ ] NO if/else rules based on specific term names
- [ ] NO "ALWAYS" language in comments or prompts
- [ ] NO priority override logic with hardcoded thresholds
- [ ] ALL classification decisions come from LLM reasoning
- [ ] Reasoning chain is explicit and returned in API response
- [ ] Instance configuration is passed from instantiator, not stored in imajin

**If ANY of these checks fail → REJECT the code**

---

## Testing Requirements

### Violation Detection Tests

```python
def test_no_static_term_lists():
    """Ensure no static cultural term lists exist in codebase."""
    violations = scan_for_static_lists([
        "services/imajin-reasoning/",
        "services/imajin-classifier/",
        "services/imajin-prompt-generator/"
    ])
    assert len(violations) == 0, f"Found static list violations: {violations}"

def test_no_hardcoded_examples_in_prompts():
    """Ensure system prompts don't contain term examples."""
    prompts = extract_all_system_prompts()
    for prompt in prompts:
        assert "femboy" not in prompt.lower(), "Hardcoded example 'femboy' found"
        assert "kawaii" not in prompt.lower(), "Hardcoded example 'kawaii' found"
        # ... check all known terms
```

### LLM Reasoning Tests

```python
def test_novel_term_reasoning():
    """LLM should reason about novel terms without static lists."""
    result = await classifier.ask("What style is 'mecha_pilot' depicted in?")
    # Should work even though 'mecha_pilot' is NOT in any hardcoded list
    assert result.style == "anime"
    assert result.reasoning  # Has explicit reasoning

def test_context_aware_reasoning():
    """LLM should consider full context, not individual term lookups."""
    result = await classifier.ask(
        "Given femboy (anime) + NYC (Western), which dominates?",
        weights={"cultural": 0.9, "geographic": 0.1}
    )
    assert result.decision == "anime"
    assert "cultural weight" in result.reasoning.lower()
```

---

## Monitoring & Auditing

### Runtime Checks

Add logging to detect if static lists are accidentally reintroduced:

```python
@app.middleware("http")
async def detect_static_list_usage(request, call_next):
    # Monitor for suspicious patterns in LLM prompts
    if hasattr(request.state, "llm_prompt"):
        prompt = request.state.llm_prompt

        # Check for hardcoded term examples
        suspicious_patterns = [
            r"Examples?:\s*(femboy|kawaii|catgirl)",
            r"(ALWAYS|NEVER)\s+(anime|photorealistic)",
            r"(femboy|kawaii)\s*→\s*(anime|photorealistic)"
        ]

        for pattern in suspicious_patterns:
            if re.search(pattern, prompt, re.IGNORECASE):
                logger.error(f"VIOLATION: Hardcoded example detected in prompt: {pattern}")
                # Optionally: raise exception in development

    return await call_next(request)
```

### Code Review Guidelines

When reviewing cultural classification code:

1. ✅ **Search for static lists**: Grep for `= [`, `= {`, arrays/dicts with cultural terms
2. ✅ **Check system prompts**: No "Examples:", "ALWAYS", "NEVER" language
3. ✅ **Verify LLM calls**: All decisions from LLM, not if/else logic
4. ✅ **Check imports**: No imports from `training/` or `static_data/` modules
5. ✅ **Validate reasoning chain**: Explicit CoT in all responses

---

## FAQ

**Q: Can we use examples in system prompts for educational purposes?**
A: NO. Even educational examples bias the LLM. Use generic instructions only.

**Q: What about confidence thresholds like `>= 0.7`?**
A: Instance-configurable thresholds are OK. Hardcoded thresholds are violations.

**Q: Can we cache LLM responses to avoid re-analyzing the same term?**
A: Caching is OK for performance, but cache must be LLM-generated, not static pre-filled data.

**Q: What if the LLM gets a term wrong?**
A: Fix the reasoning question or prompt engineering, NOT by adding the term to a static list.

**Q: How do we ensure consistency across requests?**
A: LLM reasoning should be consistent naturally. If not, improve the prompts, don't add static rules.

---

**The collective acknowledges these architectural rules and commits to enforcing them in all cultural classification code.**