Stanford AI Lab (@StanfordAILab) unfollowed @ErikJones313 on Sep 9, 2024

Erik Jones

@ErikJones313

146 Following • 433 Followers

CS PhD Student at @berkeley_ai working on LLM understanding, auditing, alignment, and safety

RT @sea_snell: On difficult problems, humans can think longer to improve their decisions. Can we instill a similar capability into LLMs? An…

a year ago

Really nice concurrent paper from @DavidGlukhov describing when seemingly benign outputs enable misuse. I especially like the "information theoretic" view: to reduce misuse, optimize to reduce the adversary's information gain wrt to a malicious task rather than simply suppress it

859 views • 5 likes • a year ago

Stanford AI Lab (@StanfordAILab) unfollowed @ErikJones313 on Sep 9, 2024

There are 32 feeds in total