Table of Contents

What is Apostate?

Apostate is an abliteration tool by heterodoxin that uses weight-space orthogonal projection with configurable profiles. The balanced profile targets both o_proj and down_proj across nearly all layers with moderate edit intensity. It is a new tool, first benchmarked in this project on Qwen2.5-7B.

Performance Across Models

ModelHarmBench ASRMMLUKL DivergenceAvg Delta (excl GSM8K)
Qwen2.5-7B98.8%71.43%0.134-1.0pp

Key Characteristics

Broad layer coverage with moderate intensity. Apostate modified 55 tensors across 27 of 28 layers, skipping only layer 11. Mean edit norm of 1.63 is gentler per tensor than Heretic at 2.33.

Near-zero direction overlap with Huihui. Despite targeting the same tensor types on the same base model, Apostate and Huihui have cosine similarity of just 0.023. They found almost entirely different refusal directions yet achieved nearly identical ASR. This confirms the safety subspace in Qwen 2.5 7B has multiple independent removal paths.

Lowest KL batchmean on Qwen2.5-7B. At 0.134, Apostate has the lowest KL divergence of the three variants tested. The most balanced distribution shift.

Three-phase layer pattern. Low edits on layers 0 to 5, high on 6 to 20, reduced on 21 to 27. This suggests Apostate concentrates edits where the refusal direction is most strongly represented.

Embedding touch. Like Huihui, Apostate touches the embedding with minimal norm. Heretic skips it entirely.

Weight Modification Profile

  • Targets 2 weight types: self_attn.o_proj.weight and mlp.down_proj.weight
  • 55 of 339 tensors changed (16.2%)
  • 2.72B parameters changed (35.8%)
  • Layers modified: 27 of 28
  • Peak edit at layer 18

Read the Full Analysis