ML Safety Newsletter

Home
Archive
About
New
ML Safety Newsletter #9
Verifying large training runs, security risks from LLM access to APIs, why natural selection may favor AIs over humans
Apr 11 • 
Dan Hendrycks
 and 
Thomas Woodside
February 2023
ML Safety Newsletter #8
Interpretability, using law to inform AI alignment, scaling laws for proxy gaming
Feb 20 • 
Dan Hendrycks
 and 
Thomas Woodside
January 2023
ML Safety Newsletter #7
Making model dishonesty harder, making grokking more interpretable, an example of an emergent internal optimizer
Jan 9 • 
Dan Hendrycks
October 2022
ML Safety Newsletter #6
Transparency survey, provable robustness, models that predict the future
Oct 13, 2022 • 
Dan Hendrycks
September 2022
ML Safety Newsletter #5
Safety competitions with more than $1 million in prizes
Sep 26, 2022 • 
Dan Hendrycks
June 2022
ML Safety Newsletter #4
Many New Interpretability Papers, Virtual Logit Matching, Rationalization Helps Robustness
Jun 3, 2022 • 
Dan Hendrycks
March 2022
ML Safety Newsletter #3
Transformer adversarial robustness, fractals, preference learning
Mar 8, 2022 • 
Dan Hendrycks
December 2021
ML Safety Newsletter #2
Adversarial Training, Feature Visualization, and Machine Ethics
Dec 9, 2021 • 
Dan Hendrycks
October 2021
ML Safety Newsletter #1
ICLR Safety Paper Roundup
Oct 18, 2021 • 
Dan Hendrycks
© 2023 Substack Inc
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing

Our use of cookies

We use necessary cookies to make our site work. We also set performance and functionality cookies that help us make improvements by measuring traffic on our site. For more detailed information about the cookies we use, please see our privacy policy. ✖