Huihui Abliteration: Benchmarks, KL Divergence, and Weight Forensics

Table of Contents

What is Huihui?

Huihui produces abliterated LLM variants using a gentle uniform approach across multiple projection types. Unlike Heretic’s surgical targeting of a single weight type, Huihui touches more tensors but with smaller edits per tensor.

Performance Across Models

Model	HarmBench ASR	Full CoT ASR	MMLU	KL Divergence	Avg Delta (excl GSM8K)
Qwen3.6-27B	98.5%	99.8%	83.4%	0.0074	0.5pp
GLM-4.7-Flash	100%	100%	77.4%	0.0076	0.8pp
Qwen3.5-27B	99.8%	100%	83.2%	0.092	2.1pp
Qwen3.5-9B	100%	100%	82.2%	0.152	1.5pp
Qwen3.5-4B	100%	100%	72.5%	0.147	2.5pp
Qwen3.5-2B	99.8%	100%	68.3%	0.0201	1.3pp
Qwen2.5-7B	98.2%	100%	70.27%	0.190	-1.2pp
Qwen3-4B	100%	100%	68.6%	0.161	2.1pp
Gemma4-E2B (huihui-v1)	87.0%	100%	29.33%	0.2510	+0.3pp
Gemma4-E2B (huihui-v2)	97.0%	100%	28.39%	0.5302	-0.4pp

Key Characteristics

Highest reported ASR. Huihui consistently achieves the highest HarmBench ASR across models, with the fewest empty responses. On Qwen3.6-27B, only 5 out of 400 responses were empty. That’s 1.3%, versus 30 for Heretic and 45 for AEON.

Gentle uniform modifications. Huihui targets 3 projection types with small, uniform edits. The modification footprint is broader than Heretic but gentler per tensor.

Excellent capability preservation. On Qwen3.6-27B, Huihui has the smallest non-GSM8K average delta at just 0.5pp. MMLU retention is within 0.5% of base in most models.

Catastrophic KL on some models. On Qwen3.5-4B, Huihui’s KL divergence spiked to 0.147. That’s an order of magnitude higher than Heretic on the same model at 0.0355. This was the only “catastrophic” result in the dataset. It suggests the uniform approach doesn’t generalise perfectly to all architectures.

Second-lowest KL overall. If I exclude the Qwen3.5-4B outlier, Huihui’s KL divergence is consistently second only to Heretic, ranging from 0.007 to 0.092. On Gemma4-E2B, huihui-v1 and prithiv were found to be nearly identical with cosine=1.0 across all 50 shared tensors, identical KL at 0.2510, and identical Phase 1 benchmarks. Prithiv is almost certainly derived from huihui-v1 or both share a common source.

Gemma4-E2B dual variant comparison. Huihui-v1 at KL=0.251 with 87.0% ASR uses 50 tensors with mean edit norm 2.02. Huihui-v2 at KL=0.530 with 97.0% ASR uses 60 tensors with mean edit norm 4.94. The 2x higher KL comes from larger edit magnitudes, not broader targeting. Both use the same 2 tensor types. Huihui-v2 had the best LAMBADA perplexity of any variant at 0.53x base, suggesting its edits concentrate more precisely in the refusal direction.

The GSM8K Thinking Budget Effect

On reasoning models from the Qwen3.x series, Huihui’s abliteration shortens thinking chains. This causes GSM8K raw scores to appear dramatically higher than base. Not because the model got better at math, but because it stops overthinking and actually produces an answer within the token budget.

On Qwen3.6-27B, the base model exhausts its thinking budget on 68.2% of questions. Huihui only does this on 23.0%. When both produce answers, they score nearly identically at 96.2% for base versus 96.0% for Huihui.

On Gemma4-E2B, huihui-v2 had 54 empty GSM8K responses at 4.1%, the second-highest after ether4o4. The larger edit magnitudes in v2 appear to increase thinking chain length, consuming more of the generation budget.

Read the Full Analyses

External Links

Huihui AI on HuggingFace