600
PCH size
We introduce FIT, a scalable framework for continual unlearning, together with the PCH benchmark for evaluating forgetting effectiveness, utility preservation, and robustness in large language models.
PCH size
QA-Pairs size
Evaluation Criteria
Release time
The FIT benchmark evaluates whether large language models can reliably “forget” targeted information under continual unlearning. Unlike single-shot deletion, real-world requests arrive sequentially, often causing parameter drift, unstable gradients, and progressive utility degradation if handled with naïve unlearning strategies. FIT provides a principled framework to diagnose these challenges and assess whether a model can remain stable while removing sensitive knowledge.
The PCH benchmark includes unified synthetic corpora, well-defined evaluation protocols, and reproducible scripts. By testing models across long sequences of deletion steps, FIT offers a holistic view of forgetting effectiveness, utility preservation, and robustness against post-unlearning attacks—enabling deeper analysis of whether undesired knowledge is truly removed or merely obscured.