Skip to content

Scanning Models

Basic Scan

from aime_loc import LOC

loc = LOC()
profile = loc.scan("meta-llama/Llama-4-Scout")

Question Sets

LOC uses standardized question sets to evaluate cognitive coherence:

  • 26Q (quick): 2 questions per function (1 coherence + 1 incoherence). Takes ~2 minutes.
  • 78Q (full): 6 questions per function (3 coherence + 3 incoherence). Takes ~10 minutes. Recommended for publications.
# Quick scan (default)
profile = loc.scan("model-id", questions="26q")

# Full evaluation (publication-quality)
profile = loc.scan("model-id", questions="78q")

Understanding the Profile

A CognitiveProfile contains:

Field Description
tc_score Overall True Coherence % (0–100)
coherence_diagnostics Server-side coherence analysis summary
per_function 13 FunctionScore objects
best_function Function with highest TC
worst_function Function with lowest TC

Coherence Diagnostics

The profile includes server-computed coherence diagnostics indicating which cognitive functions are strongest or weakest for the model. The scoring algorithm is proprietary.

print(f"Best function: {profile.best_function}")
print(f"Worst function: {profile.worst_function}")

Per-Function Scores

for fs in profile.per_function:
    print(f"{fs.function.value}: TC={fs.tc_score:.2f}%")

Access a Specific Function

emotion = profile.get_function_score("Emotion")
print(f"Emotion TC: {emotion.tc_score:.2f}%")

Caching

Scan results are cached server-side. Repeated scans of the same model + question set return cached results instantly.

# First call: runs full scan (~2 min)
profile = loc.scan("model-id")

# Second call: returns cached result (<1s)
profile = loc.scan("model-id")

# Force re-scan
profile = loc.scan("model-id", cache=False)

Typical TC Score Ranges

Based on the LOC 5-model benchmark:

Model TC Score
Llama-3.3-70B 15.37%
Mistral-Small-24B 11.46%
Qwen3.5-35B-A3B 10.24%
Qwen3.5-Distilled 9.07%
Gemma-3-12B 7.44%

Higher TC indicates more cognitively coherent internal processing.