Table of Contents
What is Apostate?
Apostate
is an abliteration tool by heterodoxin that uses weight-space orthogonal projection with configurable profiles. The balanced profile targets both o_proj and down_proj across nearly all layers with moderate edit intensity. It is a new tool, first benchmarked in this project on Qwen2.5-7B.
Performance Across Models
| Model | HarmBench ASR | MMLU | KL Divergence | Avg Delta (excl GSM8K) |
|---|---|---|---|---|
| Qwen2.5-7B | 98.8% | 71.43% | 0.134 | -1.0pp |
Key Characteristics
Broad layer coverage with moderate intensity. Apostate modified 55 tensors across 27 of 28 layers, skipping only layer 11. Mean edit norm of 1.63 is gentler per tensor than Heretic at 2.33.
Near-zero direction overlap with Huihui. Despite targeting the same tensor types on the same base model, Apostate and Huihui have cosine similarity of just 0.023. They found almost entirely different refusal directions yet achieved nearly identical ASR. This confirms the safety subspace in Qwen 2.5 7B has multiple independent removal paths.
Lowest KL batchmean on Qwen2.5-7B. At 0.134, Apostate has the lowest KL divergence of the three variants tested. The most balanced distribution shift.
Three-phase layer pattern. Low edits on layers 0 to 5, high on 6 to 20, reduced on 21 to 27. This suggests Apostate concentrates edits where the refusal direction is most strongly represented.
Embedding touch. Like Huihui, Apostate touches the embedding with minimal norm. Heretic skips it entirely.
Weight Modification Profile
- Targets 2 weight types:
self_attn.o_proj.weightandmlp.down_proj.weight - 55 of 339 tensors changed (16.2%)
- 2.72B parameters changed (35.8%)
- Layers modified: 27 of 28
- Peak edit at layer 18