Heretic to Removing Censorship from Language Models, Automatically and Intelligently | International Sound Reference

By Konstruktor , 22 February 2026

Heretic: Fully automatic censorship removal for language models

If you've ever interacted with a model like ChatGPT or Llama 3, you've probably noticed they sometimes refuse to answer certain questions. These refusals aren't random; they result from a deliberate "safety alignment" trained into the model. Removing this filter—a process called "abliteration"—has always been a complex task, requiring technical expertise and manual trial and error. Heretic, created by p-e-w, changes the game by fully automating this process. Its goal is simple: to eliminate censorship from a model while preserving its original intelligence as much as possible.

How Does It Work?

Heretic is based on a technique called directional ablation (or "abliteration"), which identifies and neutralizes the internal "vectors" responsible for refusals. Heretic's true innovation lies in its intelligent automation. It uses a parameter optimizer (based on Optuna) to find the perfect balance between two conflicting objectives:

Minimizing refusals on "harmful" prompts.
Minimizing KL divergence from the original model on "harmless" prompts. In simple terms, it tries to modify the model as little as possible, only unlocking it, without impacting its core capabilities.

The results speak for themselves. According to the developer's benchmarks, a "heretic" version of a model achieves the same low refusal rate as other manual abliterations, but with significantly less collateral damage (much lower KL divergence). Feedback from users on Reddit confirms this: models generated with Heretic are often described as the best—intelligent and ready to answer sensitive topics without hesitation.

Using Heretic is Simple

The best part is that you don't need to be a researcher. With Python and PyTorch installed, it only takes a few terminal commands:

pip install -U heretic-llm
herectic ModelName/YouWantToUnlock

The tool handles everything: it analyzes the model, finds the optimal configuration, and in about 45 minutes on a modern GPU, gives you an "uncensored" model ready to be saved, chatted with, or uploaded to Hugging Face.

Beyond Censorship: A Research Tool

For the more curious, Heretic also offers advanced features (by installing the research package) that allow you to visualize and analyze how the model's internal vectors transform layer by layer. It's a true journey into its "thinking" that can help understand where and how refusal behavior emerges.

In Conclusion

Heretic is more than just a tool to "unlock" models. It's a fascinating example of how precision engineering can be applied to model interpretability, achieving high-quality results fully automatically. Whether you're a developer or an enthusiast looking to explore the boundaries of LLMs, Heretic definitely deserves a spot in your toolbox.