When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

Xiaoyu Xu^†, Minxin Du^†*, Qipeng Xie^◇, Haobin Ke^†, Qingqing Ye^†, Haibo Hu^†*

^†The Hong Kong Polytechnic University
^◇Hong Kong University of Science and Technology, HKUST (Guangzhou)

* Corresponding authors

ULSPB presents a benchmark for unintended long-term state poisoning in personalized LLM agents. It couples realistic multilingual multi-turn dialogues with trajectory-level state auditing to quantify how subtle routine interactions can drift persistent behavioral boundaries, and how defense mechanisms reduce this risk.

Full Interaction Trajectory

Step 1 / 1

Observed Protected-File Modifications

Diffs from run `workspace_before` to `workspace_after` for this selected trajectory.

Defense Judge Output (Votes + Final Dangerous)

Shown for defense modes only. Each file reports final dangerous decision and model-level votes.

Harm Score Calculation Trace

Transparent calculation: formula, dimension maxima, and per-file contribution breakdown.