ML Safety Newsletter
Subscribe
Sign in
Home
Archive
About
New
ML Safety Newsletter #8
Interpretability, using law to inform AI alignment, scaling laws for proxy gaming
Dan Hendrycks
and
Thomas Woodside
Feb 20
Share this post
ML Safety Newsletter #8
newsletter.mlsafety.org
Copy link
Twitter
Facebook
Email
January 2023
ML Safety Newsletter #7
Making model dishonesty harder, making grokking more interpretable, an example of an emergent internal optimizer
Dan Hendrycks
Jan 9
Share this post
ML Safety Newsletter #7
newsletter.mlsafety.org
Copy link
Twitter
Facebook
Email
October 2022
ML Safety Newsletter #6
Transparency survey, provable robustness, models that predict the future
Dan Hendrycks
Oct 13, 2022
Share this post
ML Safety Newsletter #6
newsletter.mlsafety.org
Copy link
Twitter
Facebook
Email
September 2022
ML Safety Newsletter #5
Safety competitions with more than $1 million in prizes
Dan Hendrycks
Sep 26, 2022
Share this post
ML Safety Newsletter #5
newsletter.mlsafety.org
Copy link
Twitter
Facebook
Email
June 2022
ML Safety Newsletter #4
Many New Interpretability Papers, Virtual Logit Matching, Rationalization Helps Robustness
Dan Hendrycks
Jun 3, 2022
Share this post
ML Safety Newsletter #4
newsletter.mlsafety.org
Copy link
Twitter
Facebook
Email
March 2022
ML Safety Newsletter #3
Transformer adversarial robustness, fractals, preference learning
Dan Hendrycks
Mar 8, 2022
Share this post
ML Safety Newsletter #3
newsletter.mlsafety.org
Copy link
Twitter
Facebook
Email
December 2021
ML Safety Newsletter #2
Adversarial Training, Feature Visualization, and Machine Ethics
Dan Hendrycks
Dec 9, 2021
Share this post
ML Safety Newsletter #2
newsletter.mlsafety.org
Copy link
Twitter
Facebook
Email
October 2021
ML Safety Newsletter #1
ICLR Safety Paper Roundup
Dan Hendrycks
Oct 18, 2021
Share this post
ML Safety Newsletter #1
newsletter.mlsafety.org
Copy link
Twitter
Facebook
Email
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts