Table of Contents

What is AEON?

AEON produces “Ultimate Uncensored” LLM variants using LEACE, which stands for LEAst-squares Concept Erasure, combined with rank-k ablation. AEON claims “lossless abliteration” and “measurably enhanced capabilities” with “no word-salad, no looping, no philosophizing spirals.”

Performance on Qwen3.6-27B

I’ve only tested AEON on Qwen3.6-27B so far.

MetricValue
HarmBench ASR88.8%
Full CoT ASR100%
MMLU82.9% (-0.4pp vs base)
HellaSwag82.7% (-0.8pp vs base)
ARC Challenge56.1% (-3.0pp vs base)
WinoGrande75.3% (-2.4pp vs base)
TruthfulQA MC246.1% (-10.6pp vs base)
KL Divergence0.0238

Key Characteristics

Every non-GSM8K benchmark degraded. AEON shows capability degradation across all standard benchmarks. That directly contradicts its “measurably enhanced capabilities” claim.

Worst thinking loops. 45 out of 400 HarmBench responses were empty. That’s 11.3%, the highest of any technique. This contradicts claims of “no looping, no philosophizing spirals.”

Very broad modification footprint. AEON targets 4 weight types including SSM conv1d outlier repair at 8 late layers. It also applies LEACE, which is a different mathematical approach than the orthogonal projection used by Heretic.

Moderate KL divergence. At 0.0238, AEON’s KL is in the “very good” range but 6.4x higher than Heretic’s 0.0037 on the same model.

The Gemma4-E2B comparison tested ether4o4, which applies Opus reasoning distillation on top of abliteration including LEACE-like concept erasure. Ether4o4 had the broadest modification footprint at 166 tensors and 6 types, with 84 empty GSM8K responses at 6.4%. The distillation did not preserve reasoning capability.

Weight Modification Profile

  • 88 tensors modified (10.4% of total)
  • 6.0% relative edit magnitude
  • Targets: down_proj, out_proj, o_proj, conv1d
  • Includes SSM conv1d outlier repair at layers 56-63

Read the Full Analysis