Large language models (LLMs) like ChatGPT are powerful AI systems that generate text and answer questions. However, they are so big that they require a lot of computing power, making it hard to use them on smaller devices. One solution is to compress these models, a process called quantization, which makes them more efficient. But we found that compressing LLMs can make them less safe, meaning they might generate inappropriate or harmful outputs more easily.
In our work, we tested several common compression methods and found clear safety issues. To fix this, we designed a new system called Q-resafe. It acts like a safety patch, restoring the LLMs' protective filters even after compression, without sacrificing their usefulness. Our tests show that Q-resafe helps compressed models stay as safe as they were before compression, even under tough conditions. This research helps ensure that as LLMs become more widely used, they stay both efficient and responsible.
We hope our work offers a new perspective on the intersection of model compression and safety in LLMs.
Searches for safety-related weights in full-precision pre-trained models and implements quantization to preserve critical safety elements during weight conversion.
Implements safety patches using a conceptual objective function built on Direct Preference Optimization (DPO) loss to efficiently restore safety while minimizing impact on utility.