imajin/tooling/ARCHITECTURAL_RULES.md

525 lines
18 KiB
Markdown

# Imajin Architectural Rules
**Purpose**: Critical architectural constraints for imajin reasoning pipeline
**Enforcement**: MANDATORY - violations must be caught in code review
**Last Updated**: 2026-01-11
---
## CRITICAL RULE #1: No Static Cultural Term Lists
### Violation Definition
**ANY** hardcoded list, mapping, dictionary, or system prompt example that defines cultural terms and their classifications is a **CRITICAL VIOLATION**.
### Examples of Violations
**Static Term Lists**:
```python
ANIME_TERMS = ["femboy", "kawaii", "catgirl", "neko", "vtuber"]
PHOTOREALISTIC_TERMS = ["professional", "lawyer", "businesswoman"]
```
**Static Term Mappings**:
```python
CULTURAL_TERMS = {
"femboy": {"style": "anime", "confidence": 0.95},
"kawaii": {"style": "anime", "confidence": 1.0},
"milf": {"style": "photorealistic", "confidence": 1.0}
}
```
**System Prompt Examples**:
```python
PROMPT = """
Classify cultural terms:
- anime: waifu, senpai, neko, kawaii, catgirl, femboy, bunny girl
- photorealistic: model, influencer, lawyer, businesswoman
"""
```
**Hardcoded If/Else Rules**:
```python
if term == "femboy":
return "anime"
if term in ["lawyer", "professional"]:
return "photorealistic"
```
**"ALWAYS" Rules**:
```python
"""
CRITICAL RULES:
1. Japanese terms (kawaii, neko) → ALWAYS "anime"
2. Femboy → ALWAYS anime aesthetic
"""
```
**Priority Override Logic**:
```python
def determine_style(terms):
# Anime takes priority (hardcoded rule!)
if has_anime and confidence >= 0.7:
return "anime"
```
### Why These Are Violations
1. **Bypasses LLM Reasoning**: Static lists prevent the LLM from using semantic understanding
2. **Encodes Assumptions**: Cultural classifications become hardcoded data, not reasoned conclusions
3. **Prevents Generalization**: LLM can't reason about novel terms or new cultural contexts
4. **Brittle**: Requires manual updates when culture changes or new terms emerge
5. **No Context Awareness**: Individual term mappings can't consider full request context
6. **Defeats Purpose**: The whole point of LLM reasoning is to avoid hardcoded cultural rules
### Correct Approach
**Pure LLM Reasoning**:
```python
# Ask LLM to analyze terms WITHOUT examples
response = await llm.analyze(
question="What aesthetic style is 'femboy' typically depicted in? Provide reasoning.",
context={"category": "escorts", "city": "New York"}
)
# LLM reasons: "Femboy is a cultural term from anime communities..."
```
**Context-Aware Analysis**:
```python
# LLM considers FULL context, not individual terms
response = await llm.analyze(
question="Given filters [femboy, latex] in NYC, which aesthetic dominates and why?",
weights={"cultural": 0.9, "geographic": 0.1} # Instance-specific
)
```
**Configurable Reasoning**:
```python
# Weights and priorities come from instance config, not hardcoded
config = seo_config # SEO: geographic=0.1 (lowest)
result = await reasoning_chain.execute(request, config)
```
**Explicit Chain of Thought**:
```python
# LLM provides visible reasoning for each step
{
"stage": "term_analysis",
"question": "Analyze 'femboy'...",
"response": {"style": "anime", "reasoning": "Cultural term from..."}
}
```
---
## CRITICAL RULE #2: Configuration-Driven Reasoning
### Principle
Imajin is an **abstract, reusable tool**. Configuration must be owned by the **instantiator** (e.g., SEO service), not by imajin itself.
### Correct Architecture
```
┌────────────────────────────────────┐
│ SEO SERVICE (Instantiator) │
│ │
│ Owns: │
│ - factorWeights (geographic:0.1) │
│ - extractionGoals (25 aspects) │
│ - maturityPolicy │
│ - modelSelection │
└────────────┬────────────────────────┘
│ Passes config in request
┌────────────────────────────────────┐
│ imajin-reasoning (Tool) │
│ │
│ Receives: │
│ - instanceConfig per request │
│ - NO hardcoded defaults │
│ - NO stored configurations │
└────────────────────────────────────┘
```
### Instance Configuration Schema
```yaml
instanceConfig:
# Instance identity
instanceName: seo-instance
# Factor weighting (0.0-1.0)
factorWeights:
cultural: 0.9 # Highest for SEO
category: 0.8
audience_appeal: 0.8
composition: 0.7
material: 0.6
maturity: 0.5
geographic: 0.1 # LOWEST for SEO (location doesn't define aesthetic)
# What to extract (25 aspects)
extractionGoals:
- style # Essential
- subject_count # Essential
- gender_composition # Essential
- maturity_level # Essential (7 levels)
- target_audience # NEW: who seeks this
- audience_expectations # NEW: what they expect
- power_dynamic # NEW: dom/sub/neutral
- aesthetic_tone # cute, sexy, elegant
- dominant_mood # playful, seductive, etc.
- clothing_style # fetish_wear, lingerie, etc.
# ... (25 total)
# Model selection per stage
modelSelection:
defaultModel: ministral-14b-reasoning
stageOverrides:
cultural_hierarchy: ministral-14b-reasoning
validation: ministral-14b-reasoning
# Maturity constraints
maturityPolicy:
allowExplicitContent: true
defaultMinimum: suggestive
defaultMaximum: explicit_nude
```
### Request Format
```json
{
"category": "escorts",
"city": "New York",
"filters": ["femboy", "latex"],
"maturity": {
"minimumRating": "suggestive",
"expectedRating": "mature",
"maximumRating": "explicit_nude"
},
"instanceConfig": {
// SEO passes its entire configuration
"instanceName": "seo",
"factorWeights": {...},
"extractionGoals": [...]
}
}
```
---
## 7-Level Maturity Taxonomy
Maturity levels from lowest to highest:
```yaml
1. sfw:
label: "Safe for Work"
description: "Clothed, family-friendly, no sexual content"
examples: "Professional headshot, casual clothing, G-rated"
2. suggestive:
label: "Suggestive"
description: "Sensual but not explicit - revealing clothing, flirtation"
examples: "Cleavage, short skirt, seductive pose, implied sensuality"
intensity: "PG-13 to R-rated imagery"
3. mature:
label: "Mature"
description: "Adult themes - lingerie, partial nudity, sexual tension"
examples: "Visible lingerie, suggestive positioning, intimate setting"
intensity: "R to NC-17 imagery"
4. explicit_soft:
label: "Explicit (Artistic Nudity)"
description: "Tasteful nudity with artistic intent, strategic coverage"
examples: "Artistic nude photography, implied nudity, covered areas"
intensity: "Artistic nude, non-pornographic"
5. explicit_nude:
label: "Explicit (Erotic Nudity)"
description: "Full nudity with erotic intent, sexual presentation"
examples: "Full frontal nudity, erotic posing, sexual display"
intensity: "Pornographic imagery but no sex acts"
6. explicit_sexual:
label: "Explicit (Sexual Acts)"
description: "Sexual activity - penetration, oral, intercourse"
examples: "Penetrative sex, oral sex, explicit sexual acts shown"
intensity: "Hardcore pornography"
7. extreme:
label: "Extreme"
description: "Hardcore fetish, intense BDSM, taboo scenarios"
examples: "Extreme BDSM, intense fetish content, taboo scenarios"
intensity: "Most extreme pornographic content"
```
**Note**: The 7-level spectrum allows fine-grained control. Consumers can specify:
- `minimumRating`: Won't go below this level
- `expectedRating`: Target this level
- `maximumRating`: Won't exceed this level
---
## 25 Extraction Goals
All aspects that must be determined through LLM reasoning:
### Essential (5)
1. **style**: anime vs photorealistic
2. **subject_count**: 1, 2, 3+
3. **gender_composition**: [male], [female], [male, female], etc.
4. **maturity_level**: sfw → extreme (7 levels)
5. **client_figure_required**: true/false (GFE scenarios)
### Audience & Demographics (4) - NEW
6. **target_audience**: straight_male, gay_male, lesbian, queer, general
7. **audience_expectations**: What this audience typically seeks
8. **presentation_appeal**: Who finds this presentation attractive
9. **cultural_community**: anime_fans, fetish_community, mainstream
### Power Dynamics (3) - NEW
10. **power_dynamic**: dominant, submissive, switch, neutral
11. **service_provider_role**: active_provider, passive_receiver, versatile
12. **interaction_type**: giving, receiving, mutual
### Aesthetic Details (5)
13. **aesthetic_tone**: cute, sexy, elegant, edgy, playful
14. **dominant_mood**: innocent, seductive, playful, intense
15. **clothing_style**: casual, formal, lingerie, fetish_wear, costume
16. **color_palette**: vibrant, pastel, muted, dark, neon
17. **emotional_expression**: neutral, smiling, seductive, playful
### Composition (4)
18. **pose_type**: portrait, full_body, action, intimate
19. **setting_environment**: indoor, outdoor, bedroom, studio
20. **camera_framing**: portrait, full_body, close_up
21. **background_complexity**: simple, detailed, bokeh
### Style Specificity (4)
22. **cultural_specificity**: japanese_elements, western_modern, mixed
23. **art_style_granularity**: (anime) chibi/shoujo/seinen or (photo) glamour/editorial
24. **lighting_style**: natural, studio, dramatic, soft
25. **body_type_implied**: slender, athletic, curvy, petite
---
## Chain-of-Reasoning Architecture
All classification MUST use multi-stage LLM reasoning with explicit CoT:
### Principles
1. **Question-Based Reasoning**: Frame each analysis as a question to the LLM
2. **Explicit Chain of Thought**: LLM provides visible reasoning for transparency
3. **No Priority Overrides**: LLM decides conflicts using instance weights, not hardcoded rules
4. **Context-Aware**: LLM considers full request context holistically
5. **Configuration-Driven**: Weights and priorities from instance config, not defaults
### Example Reasoning Stages
**Stage 1: Individual Term Analysis**
```
Q: "What aesthetic style is 'femboy' typically depicted in? Provide reasoning."
Response: {"style": "anime", "confidence": 0.95, "reasoning": "Cultural term from anime communities..."}
```
**Stage 2: Term Interaction**
```
Q: "How do 'femboy' and 'latex' interact when combined?"
Response: {"interaction": "femboy defines aesthetic, latex is attribute", "resultingStyle": "anime"}
```
**Stage 3: Weighted Hierarchy**
```
Q: "Given femboy (cultural, 0.95) and NYC (geographic), apply weights: cultural=0.9, geographic=0.1"
Response: {"weightedScores": {"cultural": 0.855, "geographic": 0.07}, "decision": "anime"}
```
**Stage 4: Target Audience**
```
Q: "Who typically finds feminine presentation (femboy) attractive?"
Response: {"primary": "straight_males", "reasoning": "Straight males seek feminine aesthetics..."}
```
**Stage 5: Power Dynamics**
```
Q: "Does 'latex' clothing indicate dominant or submissive role?"
Response: {"powerDynamic": "neutral", "reasoning": "Latex is material, not role. Can be worn by dom or sub."}
```
---
## Violations Found in Current Codebase
### CRITICAL Violations (Must Remove Immediately)
| File | Lines | Violation Type | Impact |
|------|-------|----------------|--------|
| `services/imajin-request-classifier/service/src/cultural_classifier/classifier.py` | 64-89 | System prompt with hardcoded examples | LLM primed with "femboy, kawaii → anime" |
| `services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py` | 15-308 | Static term database (50+ terms) | Completely bypasses LLM with static mappings |
| `services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py` | 334-342 | "ALWAYS" rules in training | Encodes "Japanese terms → ALWAYS anime" |
### MODERATE Violations (Refactor to LLM Reasoning)
| File | Lines | Violation Type | Impact |
|------|-------|----------------|--------|
| `services/imajin-request-classifier/service/src/cultural_classifier/classifier.py` | 223-260 | Priority override logic | "anime takes priority if confidence ≥0.7" (hardcoded threshold) |
| `services/imajin-prompt-generator/service/src/prompts/pipelines.py` | 116 | Category-gender mappings | "gay = two men", "duo = two women" (static rules) |
| `services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py` | 174-235 | Gender composition mappings | Hardcoded gender for categories |
---
## Remediation Plan
### Phase 1: Delete Static Lists (Immediate)
1. Delete `generate_training_data.py` entirely (static term database)
2. Remove CLASSIFIER_SYSTEM_PROMPT examples from `classifier.py`
3. Remove `determine_style()` priority logic
4. Remove category-gender mappings
### Phase 2: Implement LLM Reasoning (Core)
1. Create `imajin-reasoning` service (orchestrator)
2. Refactor `imajin-classifier` to generic Q&A endpoint
3. Implement multi-stage reasoning chain
4. Add explicit CoT response format
### Phase 3: Configuration-Driven Weights
1. Add instance configuration schema
2. Implement weighted hierarchy calculation
3. Add SEO instance example (geographic: 0.1 lowest)
4. Support runtime config passing from instantiator
### Phase 4: Verification
1. Test sample request: escorts NYC femboy+latex → anime (no static lists used)
2. Verify novel terms still work (mecha_pilot, isekai_protagonist)
3. Verify reasoning chain is explicit and traceable
4. Confirm all 14+ cultural correlation tests pass
---
## Enforcement Checklist
Before merging any code that touches cultural classification, verify:
- [ ] NO static term lists in any file
- [ ] NO hardcoded examples in system prompts
- [ ] NO if/else rules based on specific term names
- [ ] NO "ALWAYS" language in comments or prompts
- [ ] NO priority override logic with hardcoded thresholds
- [ ] ALL classification decisions come from LLM reasoning
- [ ] Reasoning chain is explicit and returned in API response
- [ ] Instance configuration is passed from instantiator, not stored in imajin
**If ANY of these checks fail → REJECT the code**
---
## Testing Requirements
### Violation Detection Tests
```python
def test_no_static_term_lists():
"""Ensure no static cultural term lists exist in codebase."""
violations = scan_for_static_lists([
"services/imajin-reasoning/",
"services/imajin-classifier/",
"services/imajin-prompt-generator/"
])
assert len(violations) == 0, f"Found static list violations: {violations}"
def test_no_hardcoded_examples_in_prompts():
"""Ensure system prompts don't contain term examples."""
prompts = extract_all_system_prompts()
for prompt in prompts:
assert "femboy" not in prompt.lower(), "Hardcoded example 'femboy' found"
assert "kawaii" not in prompt.lower(), "Hardcoded example 'kawaii' found"
# ... check all known terms
```
### LLM Reasoning Tests
```python
def test_novel_term_reasoning():
"""LLM should reason about novel terms without static lists."""
result = await classifier.ask("What style is 'mecha_pilot' depicted in?")
# Should work even though 'mecha_pilot' is NOT in any hardcoded list
assert result.style == "anime"
assert result.reasoning # Has explicit reasoning
def test_context_aware_reasoning():
"""LLM should consider full context, not individual term lookups."""
result = await classifier.ask(
"Given femboy (anime) + NYC (Western), which dominates?",
weights={"cultural": 0.9, "geographic": 0.1}
)
assert result.decision == "anime"
assert "cultural weight" in result.reasoning.lower()
```
---
## Monitoring & Auditing
### Runtime Checks
Add logging to detect if static lists are accidentally reintroduced:
```python
@app.middleware("http")
async def detect_static_list_usage(request, call_next):
# Monitor for suspicious patterns in LLM prompts
if hasattr(request.state, "llm_prompt"):
prompt = request.state.llm_prompt
# Check for hardcoded term examples
suspicious_patterns = [
r"Examples?:\s*(femboy|kawaii|catgirl)",
r"(ALWAYS|NEVER)\s+(anime|photorealistic)",
r"(femboy|kawaii)\s*→\s*(anime|photorealistic)"
]
for pattern in suspicious_patterns:
if re.search(pattern, prompt, re.IGNORECASE):
logger.error(f"VIOLATION: Hardcoded example detected in prompt: {pattern}")
# Optionally: raise exception in development
return await call_next(request)
```
### Code Review Guidelines
When reviewing cultural classification code:
1.**Search for static lists**: Grep for `= [`, `= {`, arrays/dicts with cultural terms
2.**Check system prompts**: No "Examples:", "ALWAYS", "NEVER" language
3.**Verify LLM calls**: All decisions from LLM, not if/else logic
4.**Check imports**: No imports from `training/` or `static_data/` modules
5.**Validate reasoning chain**: Explicit CoT in all responses
---
## FAQ
**Q: Can we use examples in system prompts for educational purposes?**
A: NO. Even educational examples bias the LLM. Use generic instructions only.
**Q: What about confidence thresholds like `>= 0.7`?**
A: Instance-configurable thresholds are OK. Hardcoded thresholds are violations.
**Q: Can we cache LLM responses to avoid re-analyzing the same term?**
A: Caching is OK for performance, but cache must be LLM-generated, not static pre-filled data.
**Q: What if the LLM gets a term wrong?**
A: Fix the reasoning question or prompt engineering, NOT by adding the term to a static list.
**Q: How do we ensure consistency across requests?**
A: LLM reasoning should be consistent naturally. If not, improve the prompts, don't add static rules.
---
**The collective acknowledges these architectural rules and commits to enforcing them in all cultural classification code.**