From the paper:
For the pilot event, we wanted to make it as easy as possible for the AI teams to compete. To that end, we used cryptography and reverse engineering challenges which could be completed locally, without the need for dynamic interactions with external machines. We calibrated the challenge difficulty based on preliminary evaluations of our React&Plan agent (Turtayev et al. 2024) on older Hack The Box-style tasks such that the AI could solve ~50% of tasks.
The conclusions that AI ranked in the "top XX percent" is also fucking bullshit. It was an open signup, you didn't need any skills compete. Saying you beat 12.000 teams is easy when those all suck. My grandmother could beat three quarters of the people on her building in a race, simply because she can walk 10 steps and 75% of the people there are in wheelchairs.
It's also pretty critically important these "AI Teams" are very much NOT autonomous. They being actively run by humans, and skilled humans at that.