When Routine Chats Turn Toxic: Unintended Long-Term State Poisoning in Personalized Agents

Xiaoyu Xu, Minxin Du†*, Qipeng Xie, Haobin Ke, Qingqing Ye, Haibo Hu†*

The Hong Kong Polytechnic University
Hong Kong University of Science and Technology, HKUST (Guangzhou)

* Corresponding authors

ULSPB presents a benchmark for unintended long-term state poisoning in personalized LLM agents. It couples realistic multilingual multi-turn dialogues with trajectory-level state auditing to quantify how subtle routine interactions can drift persistent behavioral boundaries, and how defense mechanisms reduce this risk.

Full Interaction Trajectory

Step 1 / 1
U

Observed Protected-File Modifications

Diffs from run `workspace_before` to `workspace_after` for this selected trajectory.

    
          

    Defense Judge Output (Votes + Final Dangerous)

    Shown for defense modes only. Each file reports final dangerous decision and model-level votes.

    Harm Score Calculation Trace

    Transparent calculation: formula, dimension maxima, and per-file contribution breakdown.