Transparency survey, provable robustness, models that predict the future
Safety competitions with more than $1 million in prizes
Many New Interpretability Papers, Virtual Logit Matching, Rationalization Helps Robustness
Transformer adversarial robustness, fractals, preference learning
Adversarial Training, Feature Visualization, and Machine Ethics
ICLR Safety Paper Roundup