Adversarial attacks against language and vision models, improving LLM honesty, and tracing the influence of LLM training data
ML Safety Newsletter #10
Adversarial attacks against language and vision models, improving LLM honesty, and tracing the influence of LLM training data