abliterate

English

Etymology

Blend of ablate +‎ obliterate. Coined by Redditor /u/FailSpai^[1] in early 2024, as the idea is to ablate refusal features to the point of obliteration.^[2]

Verb

abliterate (third-person singular simple present abliterates, present participle abliterating, simple past and past participle abliterated)

(neologism, artificial intelligence, computing) To uncensor a large language model by modifying specific model internals to remove refusal behaviours or unwanted traits, while aiming to preserve the model's other capabilities.
- 2024 June 13, Maxime Labonne, “Uncensor any LLM with abliteration”, in Hugging Face Blog‎^[3], retrieved 29 May 2025:
  Now that we have our datasets, we can load the model we want to abliterate. […] I evaluated the abliterated and source models from the previous section on the Open LLM Leaderboard and on Nous' benchmark suite.

Related terms

References

^ failspy (13 June 2024) “Comments on "Uncensor any LLM with abliteration" by mlabonne”, in Hugging Face Discussions‎^[1], retrieved 29 May 2025:
Got tagged for who to point fingers at. It's true, you can direct your hate mail about this one to me. :P I would like to provide a defense though, which is the community had taken to calling it "orthogonalization" or "Orthogonal Activation Steering". When I first used the term "abliterate", it was purely what I was tagging my own models, I didn't expect it to become the name for the method! Oops.
^ /u/FailSpai (28 May 2024) “Abliterated-v3: Details about the methodology, FAQ, source code; New Phi-3-mini-128k and Phi-3-vision-128k, re-abliterated Llama-3-70B-Instruct, and new "Geminified" model.”, in Reddit (r/LocalLLaMA)‎^[2], retrieved 29 May 2025: “ablated + obliterated = abliterated. [...] It's just wordplay to signify this particular orthogonalization methodology, applied towards generally the "abliteration" of the refusal feature. Ablating the refusal to the point of obliteration.”

[FailSpyHFComment2024OctEtym-1] spy (13 June 2024) “Comments on "Uncensor any LLM with abliteration" by mlabonne”, in Hugging Face Discussions‎^[1], retrieved 29 May 2025:
Got tagged for who to point fingers at. It's true, you can direct your hate mail about this one to me. :P I would like to provide a defense though, which is the community had taken to calling it "orthogonalization" or "Orthogonal Activation Steering". When I first used the term "abliterate", it was purely what I was tagging my own models, I didn't expect it to become the name for the method! Oops.

[FailSpaiReddit2024MayEtymMeaning-2] /u/FailSpai (28 May 2024) “Abliterated-v3: Details about the methodology, FAQ, source code; New Phi-3-mini-128k and Phi-3-vision-128k, re-abliterated Llama-3-70B-Instruct, and new "Geminified" model.”, in Reddit (r/LocalLLaMA)‎^[2], retrieved 29 May 2025: “ablated + obliterated = abliterated. [...] It's just wordplay to signify this particular orthogonalization methodology, applied towards generally the "abliteration" of the refusal feature. Ablating the refusal to the point of obliteration.”

[1]

[2]