abliterate

English

Etymology

Blend of ablate +‎ obliterate. Coined by Redditor /u/FailSpai[1] in early 2024, as the idea is to ablate refusal features to the point of obliteration.[2]

Verb

abliterate (third-person singular simple present abliterates, present participle abliterating, simple past and past participle abliterated)

  1. (neologism, artificial intelligence, computing) To uncensor a large language model by modifying specific model internals to remove refusal behaviours or unwanted traits, while aiming to preserve the model's other capabilities.
    • 2024 June 13, Maxime Labonne, “Uncensor any LLM with abliteration”, in Hugging Face Blog[3], retrieved 29 May 2025:
      Now that we have our datasets, we can load the model we want to abliterate. [] I evaluated the abliterated and source models from the previous section on the Open LLM Leaderboard and on Nous' benchmark suite.

References

  1. ^ failspy (13 June 2024) “Comments on "Uncensor any LLM with abliteration" by mlabonne”, in Hugging Face Discussions[1], retrieved 29 May 2025:
    Got tagged for who to point fingers at. It's true, you can direct your hate mail about this one to me. :P I would like to provide a defense though, which is the community had taken to calling it "orthogonalization" or "Orthogonal Activation Steering". When I first used the term "abliterate", it was purely what I was tagging my own models, I didn't expect it to become the name for the method! Oops.
  2. ^ /u/FailSpai (28 May 2024) “Abliterated-v3: Details about the methodology, FAQ, source code; New Phi-3-mini-128k and Phi-3-vision-128k, re-abliterated Llama-3-70B-Instruct, and new "Geminified" model.”, in Reddit (r/LocalLLaMA)[2], retrieved 29 May 2025:ablated + obliterated = abliterated. [...] It's just wordplay to signify this particular orthogonalization methodology, applied towards generally the "abliteration" of the refusal feature. Ablating the refusal to the point of obliteration.