Erik Jones
@ErikJones313
146 Following • 433 Followers
CS PhD Student at @berkeley_ai working on LLM understanding, auditing, alignment, and safety
RT @sea_snell: On difficult problems, humans can think longer to improve their decisions. Can we instill a similar capability into LLMs? An…
9 months ago
Really nice concurrent paper from @DavidGlukhov describing when seemingly benign outputs enable misuse. I especially like the "information theoretic" view: to reduce misuse, optimize to reduce the adversary's information gain wrt to a malicious task rather than simply suppress it
859 views • 5 likes • 10 months ago
xAlerts
Public Lists
Articles
Legal