![]()
Really nice concurrent paper from @DavidGlukhov describing when seemingly benign outputs enable misuse. I especially like the "information theoretic" view: to reduce misuse, optimize to reduce the adversary's information gain wrt to a malicious task rather than simply suppress it