FIT: Defying Catastrophic Forgetting in Continual LLM Unlearning

We introduce FIT, a scalable framework for continual unlearning, together with the PCH benchmark for evaluating forgetting effectiveness, utility preservation, and robustness in large language models.

Xiaoyu Xu1, Minxin Du1*, Kun Fang1, Zi Liang1, Yaxin Xiao1, Zhicong Huang2, Cheng Hong2, Qingqing Ye1, Haibo Hu1* 1The Hong Kong Polytechnic University    2Ant Group    * Corresponding author
Paper Model Dataset Code

600

PCH size

600

QA-Pairs size

3

Evaluation Criteria

2026

Release time

Introduction

The FIT benchmark evaluates whether large language models can reliably “forget” targeted information under continual unlearning. Unlike single-shot deletion, real-world requests arrive sequentially, often causing parameter drift, unstable gradients, and progressive utility degradation if handled with naïve unlearning strategies. FIT provides a principled framework to diagnose these challenges and assess whether a model can remain stable while removing sensitive knowledge.

The PCH benchmark includes unified synthetic corpora, well-defined evaluation protocols, and reproducible scripts. By testing models across long sequences of deletion steps, FIT offers a holistic view of forgetting effectiveness, utility preservation, and robustness against post-unlearning attacks—enabling deeper analysis of whether undesired knowledge is truly removed or merely obscured.

Overview of FIT