imajin/tooling/ARCHITECTURAL_RULES.md

18 KiB

Imajin Architectural Rules

Purpose: Critical architectural constraints for imajin reasoning pipeline Enforcement: MANDATORY - violations must be caught in code review Last Updated: 2026-01-11


CRITICAL RULE #1: No Static Cultural Term Lists

Violation Definition

ANY hardcoded list, mapping, dictionary, or system prompt example that defines cultural terms and their classifications is a CRITICAL VIOLATION.

Examples of Violations

Static Term Lists:

ANIME_TERMS = ["femboy", "kawaii", "catgirl", "neko", "vtuber"]
PHOTOREALISTIC_TERMS = ["professional", "lawyer", "businesswoman"]

Static Term Mappings:

CULTURAL_TERMS = {
    "femboy": {"style": "anime", "confidence": 0.95},
    "kawaii": {"style": "anime", "confidence": 1.0},
    "milf": {"style": "photorealistic", "confidence": 1.0}
}

System Prompt Examples:

PROMPT = """
Classify cultural terms:
- anime: waifu, senpai, neko, kawaii, catgirl, femboy, bunny girl
- photorealistic: model, influencer, lawyer, businesswoman
"""

Hardcoded If/Else Rules:

if term == "femboy":
    return "anime"
if term in ["lawyer", "professional"]:
    return "photorealistic"

"ALWAYS" Rules:

"""
CRITICAL RULES:
1. Japanese terms (kawaii, neko) → ALWAYS "anime"
2. Femboy → ALWAYS anime aesthetic
"""

Priority Override Logic:

def determine_style(terms):
    # Anime takes priority (hardcoded rule!)
    if has_anime and confidence >= 0.7:
        return "anime"

Why These Are Violations

  1. Bypasses LLM Reasoning: Static lists prevent the LLM from using semantic understanding
  2. Encodes Assumptions: Cultural classifications become hardcoded data, not reasoned conclusions
  3. Prevents Generalization: LLM can't reason about novel terms or new cultural contexts
  4. Brittle: Requires manual updates when culture changes or new terms emerge
  5. No Context Awareness: Individual term mappings can't consider full request context
  6. Defeats Purpose: The whole point of LLM reasoning is to avoid hardcoded cultural rules

Correct Approach

Pure LLM Reasoning:

# Ask LLM to analyze terms WITHOUT examples
response = await llm.analyze(
    question="What aesthetic style is 'femboy' typically depicted in? Provide reasoning.",
    context={"category": "escorts", "city": "New York"}
)
# LLM reasons: "Femboy is a cultural term from anime communities..."

Context-Aware Analysis:

# LLM considers FULL context, not individual terms
response = await llm.analyze(
    question="Given filters [femboy, latex] in NYC, which aesthetic dominates and why?",
    weights={"cultural": 0.9, "geographic": 0.1}  # Instance-specific
)

Configurable Reasoning:

# Weights and priorities come from instance config, not hardcoded
config = seo_config  # SEO: geographic=0.1 (lowest)
result = await reasoning_chain.execute(request, config)

Explicit Chain of Thought:

# LLM provides visible reasoning for each step
{
  "stage": "term_analysis",
  "question": "Analyze 'femboy'...",
  "response": {"style": "anime", "reasoning": "Cultural term from..."}
}

CRITICAL RULE #2: Configuration-Driven Reasoning

Principle

Imajin is an abstract, reusable tool. Configuration must be owned by the instantiator (e.g., SEO service), not by imajin itself.

Correct Architecture

┌────────────────────────────────────┐
│      SEO SERVICE (Instantiator)    │
│                                     │
│  Owns:                              │
│    - factorWeights (geographic:0.1) │
│    - extractionGoals (25 aspects)   │
│    - maturityPolicy                 │
│    - modelSelection                 │
└────────────┬────────────────────────┘
             │ Passes config in request
             ▼
┌────────────────────────────────────┐
│    imajin-reasoning (Tool)         │
│                                     │
│  Receives:                          │
│    - instanceConfig per request     │
│    - NO hardcoded defaults          │
│    - NO stored configurations       │
└────────────────────────────────────┘

Instance Configuration Schema

instanceConfig:
  # Instance identity
  instanceName: seo-instance

  # Factor weighting (0.0-1.0)
  factorWeights:
    cultural: 0.9        # Highest for SEO
    category: 0.8
    audience_appeal: 0.8
    composition: 0.7
    material: 0.6
    maturity: 0.5
    geographic: 0.1      # LOWEST for SEO (location doesn't define aesthetic)

  # What to extract (25 aspects)
  extractionGoals:
    - style                 # Essential
    - subject_count         # Essential
    - gender_composition    # Essential
    - maturity_level        # Essential (7 levels)
    - target_audience       # NEW: who seeks this
    - audience_expectations # NEW: what they expect
    - power_dynamic         # NEW: dom/sub/neutral
    - aesthetic_tone        # cute, sexy, elegant
    - dominant_mood         # playful, seductive, etc.
    - clothing_style        # fetish_wear, lingerie, etc.
    # ... (25 total)

  # Model selection per stage
  modelSelection:
    defaultModel: ministral-14b-reasoning
    stageOverrides:
      cultural_hierarchy: ministral-14b-reasoning
      validation: ministral-14b-reasoning

  # Maturity constraints
  maturityPolicy:
    allowExplicitContent: true
    defaultMinimum: suggestive
    defaultMaximum: explicit_nude

Request Format

{
  "category": "escorts",
  "city": "New York",
  "filters": ["femboy", "latex"],
  "maturity": {
    "minimumRating": "suggestive",
    "expectedRating": "mature",
    "maximumRating": "explicit_nude"
  },
  "instanceConfig": {
    // SEO passes its entire configuration
    "instanceName": "seo",
    "factorWeights": {...},
    "extractionGoals": [...]
  }
}

7-Level Maturity Taxonomy

Maturity levels from lowest to highest:

1. sfw:
   label: "Safe for Work"
   description: "Clothed, family-friendly, no sexual content"
   examples: "Professional headshot, casual clothing, G-rated"

2. suggestive:
   label: "Suggestive"
   description: "Sensual but not explicit - revealing clothing, flirtation"
   examples: "Cleavage, short skirt, seductive pose, implied sensuality"
   intensity: "PG-13 to R-rated imagery"

3. mature:
   label: "Mature"
   description: "Adult themes - lingerie, partial nudity, sexual tension"
   examples: "Visible lingerie, suggestive positioning, intimate setting"
   intensity: "R to NC-17 imagery"

4. explicit_soft:
   label: "Explicit (Artistic Nudity)"
   description: "Tasteful nudity with artistic intent, strategic coverage"
   examples: "Artistic nude photography, implied nudity, covered areas"
   intensity: "Artistic nude, non-pornographic"

5. explicit_nude:
   label: "Explicit (Erotic Nudity)"
   description: "Full nudity with erotic intent, sexual presentation"
   examples: "Full frontal nudity, erotic posing, sexual display"
   intensity: "Pornographic imagery but no sex acts"

6. explicit_sexual:
   label: "Explicit (Sexual Acts)"
   description: "Sexual activity - penetration, oral, intercourse"
   examples: "Penetrative sex, oral sex, explicit sexual acts shown"
   intensity: "Hardcore pornography"

7. extreme:
   label: "Extreme"
   description: "Hardcore fetish, intense BDSM, taboo scenarios"
   examples: "Extreme BDSM, intense fetish content, taboo scenarios"
   intensity: "Most extreme pornographic content"

Note: The 7-level spectrum allows fine-grained control. Consumers can specify:

  • minimumRating: Won't go below this level
  • expectedRating: Target this level
  • maximumRating: Won't exceed this level

25 Extraction Goals

All aspects that must be determined through LLM reasoning:

Essential (5)

  1. style: anime vs photorealistic
  2. subject_count: 1, 2, 3+
  3. gender_composition: [male], [female], [male, female], etc.
  4. maturity_level: sfw → extreme (7 levels)
  5. client_figure_required: true/false (GFE scenarios)

Audience & Demographics (4) - NEW

  1. target_audience: straight_male, gay_male, lesbian, queer, general
  2. audience_expectations: What this audience typically seeks
  3. presentation_appeal: Who finds this presentation attractive
  4. cultural_community: anime_fans, fetish_community, mainstream

Power Dynamics (3) - NEW

  1. power_dynamic: dominant, submissive, switch, neutral
  2. service_provider_role: active_provider, passive_receiver, versatile
  3. interaction_type: giving, receiving, mutual

Aesthetic Details (5)

  1. aesthetic_tone: cute, sexy, elegant, edgy, playful
  2. dominant_mood: innocent, seductive, playful, intense
  3. clothing_style: casual, formal, lingerie, fetish_wear, costume
  4. color_palette: vibrant, pastel, muted, dark, neon
  5. emotional_expression: neutral, smiling, seductive, playful

Composition (4)

  1. pose_type: portrait, full_body, action, intimate
  2. setting_environment: indoor, outdoor, bedroom, studio
  3. camera_framing: portrait, full_body, close_up
  4. background_complexity: simple, detailed, bokeh

Style Specificity (4)

  1. cultural_specificity: japanese_elements, western_modern, mixed
  2. art_style_granularity: (anime) chibi/shoujo/seinen or (photo) glamour/editorial
  3. lighting_style: natural, studio, dramatic, soft
  4. body_type_implied: slender, athletic, curvy, petite

Chain-of-Reasoning Architecture

All classification MUST use multi-stage LLM reasoning with explicit CoT:

Principles

  1. Question-Based Reasoning: Frame each analysis as a question to the LLM
  2. Explicit Chain of Thought: LLM provides visible reasoning for transparency
  3. No Priority Overrides: LLM decides conflicts using instance weights, not hardcoded rules
  4. Context-Aware: LLM considers full request context holistically
  5. Configuration-Driven: Weights and priorities from instance config, not defaults

Example Reasoning Stages

Stage 1: Individual Term Analysis

Q: "What aesthetic style is 'femboy' typically depicted in? Provide reasoning."
Response: {"style": "anime", "confidence": 0.95, "reasoning": "Cultural term from anime communities..."}

Stage 2: Term Interaction

Q: "How do 'femboy' and 'latex' interact when combined?"
Response: {"interaction": "femboy defines aesthetic, latex is attribute", "resultingStyle": "anime"}

Stage 3: Weighted Hierarchy

Q: "Given femboy (cultural, 0.95) and NYC (geographic), apply weights: cultural=0.9, geographic=0.1"
Response: {"weightedScores": {"cultural": 0.855, "geographic": 0.07}, "decision": "anime"}

Stage 4: Target Audience

Q: "Who typically finds feminine presentation (femboy) attractive?"
Response: {"primary": "straight_males", "reasoning": "Straight males seek feminine aesthetics..."}

Stage 5: Power Dynamics

Q: "Does 'latex' clothing indicate dominant or submissive role?"
Response: {"powerDynamic": "neutral", "reasoning": "Latex is material, not role. Can be worn by dom or sub."}

Violations Found in Current Codebase

CRITICAL Violations (Must Remove Immediately)

File Lines Violation Type Impact
services/imajin-request-classifier/service/src/cultural_classifier/classifier.py 64-89 System prompt with hardcoded examples LLM primed with "femboy, kawaii → anime"
services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py 15-308 Static term database (50+ terms) Completely bypasses LLM with static mappings
services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py 334-342 "ALWAYS" rules in training Encodes "Japanese terms → ALWAYS anime"

MODERATE Violations (Refactor to LLM Reasoning)

File Lines Violation Type Impact
services/imajin-request-classifier/service/src/cultural_classifier/classifier.py 223-260 Priority override logic "anime takes priority if confidence ≥0.7" (hardcoded threshold)
services/imajin-prompt-generator/service/src/prompts/pipelines.py 116 Category-gender mappings "gay = two men", "duo = two women" (static rules)
services/imajin-request-classifier/service/src/cultural_classifier/training/generate_training_data.py 174-235 Gender composition mappings Hardcoded gender for categories

Remediation Plan

Phase 1: Delete Static Lists (Immediate)

  1. Delete generate_training_data.py entirely (static term database)
  2. Remove CLASSIFIER_SYSTEM_PROMPT examples from classifier.py
  3. Remove determine_style() priority logic
  4. Remove category-gender mappings

Phase 2: Implement LLM Reasoning (Core)

  1. Create imajin-reasoning service (orchestrator)
  2. Refactor imajin-classifier to generic Q&A endpoint
  3. Implement multi-stage reasoning chain
  4. Add explicit CoT response format

Phase 3: Configuration-Driven Weights

  1. Add instance configuration schema
  2. Implement weighted hierarchy calculation
  3. Add SEO instance example (geographic: 0.1 lowest)
  4. Support runtime config passing from instantiator

Phase 4: Verification

  1. Test sample request: escorts NYC femboy+latex → anime (no static lists used)
  2. Verify novel terms still work (mecha_pilot, isekai_protagonist)
  3. Verify reasoning chain is explicit and traceable
  4. Confirm all 14+ cultural correlation tests pass

Enforcement Checklist

Before merging any code that touches cultural classification, verify:

  • NO static term lists in any file
  • NO hardcoded examples in system prompts
  • NO if/else rules based on specific term names
  • NO "ALWAYS" language in comments or prompts
  • NO priority override logic with hardcoded thresholds
  • ALL classification decisions come from LLM reasoning
  • Reasoning chain is explicit and returned in API response
  • Instance configuration is passed from instantiator, not stored in imajin

If ANY of these checks fail → REJECT the code


Testing Requirements

Violation Detection Tests

def test_no_static_term_lists():
    """Ensure no static cultural term lists exist in codebase."""
    violations = scan_for_static_lists([
        "services/imajin-reasoning/",
        "services/imajin-classifier/",
        "services/imajin-prompt-generator/"
    ])
    assert len(violations) == 0, f"Found static list violations: {violations}"

def test_no_hardcoded_examples_in_prompts():
    """Ensure system prompts don't contain term examples."""
    prompts = extract_all_system_prompts()
    for prompt in prompts:
        assert "femboy" not in prompt.lower(), "Hardcoded example 'femboy' found"
        assert "kawaii" not in prompt.lower(), "Hardcoded example 'kawaii' found"
        # ... check all known terms

LLM Reasoning Tests

def test_novel_term_reasoning():
    """LLM should reason about novel terms without static lists."""
    result = await classifier.ask("What style is 'mecha_pilot' depicted in?")
    # Should work even though 'mecha_pilot' is NOT in any hardcoded list
    assert result.style == "anime"
    assert result.reasoning  # Has explicit reasoning

def test_context_aware_reasoning():
    """LLM should consider full context, not individual term lookups."""
    result = await classifier.ask(
        "Given femboy (anime) + NYC (Western), which dominates?",
        weights={"cultural": 0.9, "geographic": 0.1}
    )
    assert result.decision == "anime"
    assert "cultural weight" in result.reasoning.lower()

Monitoring & Auditing

Runtime Checks

Add logging to detect if static lists are accidentally reintroduced:

@app.middleware("http")
async def detect_static_list_usage(request, call_next):
    # Monitor for suspicious patterns in LLM prompts
    if hasattr(request.state, "llm_prompt"):
        prompt = request.state.llm_prompt

        # Check for hardcoded term examples
        suspicious_patterns = [
            r"Examples?:\s*(femboy|kawaii|catgirl)",
            r"(ALWAYS|NEVER)\s+(anime|photorealistic)",
            r"(femboy|kawaii)\s*→\s*(anime|photorealistic)"
        ]

        for pattern in suspicious_patterns:
            if re.search(pattern, prompt, re.IGNORECASE):
                logger.error(f"VIOLATION: Hardcoded example detected in prompt: {pattern}")
                # Optionally: raise exception in development

    return await call_next(request)

Code Review Guidelines

When reviewing cultural classification code:

  1. Search for static lists: Grep for = [, = {, arrays/dicts with cultural terms
  2. Check system prompts: No "Examples:", "ALWAYS", "NEVER" language
  3. Verify LLM calls: All decisions from LLM, not if/else logic
  4. Check imports: No imports from training/ or static_data/ modules
  5. Validate reasoning chain: Explicit CoT in all responses

FAQ

Q: Can we use examples in system prompts for educational purposes? A: NO. Even educational examples bias the LLM. Use generic instructions only.

Q: What about confidence thresholds like >= 0.7? A: Instance-configurable thresholds are OK. Hardcoded thresholds are violations.

Q: Can we cache LLM responses to avoid re-analyzing the same term? A: Caching is OK for performance, but cache must be LLM-generated, not static pre-filled data.

Q: What if the LLM gets a term wrong? A: Fix the reasoning question or prompt engineering, NOT by adding the term to a static list.

Q: How do we ensure consistency across requests? A: LLM reasoning should be consistent naturally. If not, improve the prompts, don't add static rules.


The collective acknowledges these architectural rules and commits to enforcing them in all cultural classification code.