Qualifier: Scoring and results
Thanks again to everyone who participated in FE-CTF 2022: Cyber Demon.
Our scoring system caused some confusion among players (especially those who were not aware of the FAQ). Some players also felt that the decaying nature of the points given to teams who solved a challenge later was unfair.
We will try to explain the scoring system and address those concerns.
A good scoring system should rank teams fairly based on their ability to solve challenges. This is the primary goal.
Secondary goals include:
- Prevent flag hoarding.
- Incentivize teams to try unsolved challenges.
- Limit the impact of large teams (there will be cheaters).
- Make the scoreboard hilarious.
The first piece of evidence regarding the primay goal is binary: "did team X solve challenge Y or not"? Next up is "how fast did they solve it"? This signal, however, is noisy (people sleep at different times, work on other challenges, etc.), and therefore it should have a small impact or be ignored altogether.
The difficulty of a challenge should also be taken into consideration: a team who can solve hard challenges fast, is better (at CTF) than one who can only solve easy challenges slowly (or easy challenges fast, or no challenges at all).
But it is very hard to accurately gauge the difficulty of a challenge — especially for the author(s) — and therefore to assign a score. This is where the scoring algorithm comes in.
Our system determines the difficulty of a challenge purely based on how many teams solved it. Other indicators (that we do not consider) include:
- Judgement by challenge author and/or play testers.
- Time from release to first solve.
Broadly speaking (see the FAQ for the details), our scoring system works like this:
- Take ALL THE POINTS (1'000'000, but it could be anything).
- Distribute them equally over solved challenges, with the exception that some challenges are grouped together (Dig Host, Bob and Alice, etc.) and collectively count as one challenge.
- For each challenge split the points allocated to it among those who solved it, but give fast solvers slightly more points than slow solvers.
The last point of this procedure tries to both judge the difficulty of a challenge and a team's ability to solve challenges in general (c.f. "how fast did they solve it?"). It also means that when one team solves a challenge other teams will loose points, and this may have been the root cause of confusion.
Since we score challenges based on their difficulty *and* we gauge the difficulty based on the number of solves, the value of a solve depends on future information, which is why you can loose points as that information becomes available.
Let us be clear: the difference between the scoring system used and one without the penalty to later solves is very minor. For this competition the first difference is found in the 17th place on the final scoreboard: team "0x6a75746c616e646961646d696e73" who would have swapped places with "ROPert" at 18th place. And the top-50 teams would not have changed, except for their relative placement.
So why did we include it in the first place? There are two reasons.
The first is aesthetic; we didn't like if we would have to tie-break teams with the same number of points based on who solved a challenge first. Our system does not prevent this, but does make it much more unlikely. Indeed, during the whole competition only two times did a team in the top-50 solve a challenge and end up with the exact same number of points as someone else:
- Friday 14:49:09:
- "z3dc0ps" solves "Dig Host #3" and ties with "shalaamum" for 41st/42nd.
- Saturday 15:00:51:
- "0x6a75746c616e646961646d696e73" solves "Guessing Game" and ties with "just_taking_a_look" for 11th/12th.
- [UTC timestamp in ISO format, team name, challenge name]
Compare this with 5280 ties within the top-50, if no penalty was used.
The second reason is to prevent flag hoarding. We did not observe any flag hoarding, but it is of course impossible to say if we would have, had the penalty not been in place. Judging from discussions on Discord, we feel confident in saying that the system effectively made teams refrain from hoarding flags. If not for any other reason, then because the consequences were unclear.
During the competition there were 1162 correct flag submissions. They can be found in JSON here:
This data along with the challenge weights which can be found on the challenge board is enough to verify the claims above, and also play the "what if" game with other scoring systems. Each entry in the list follows this format:
(Congratulations to the highest scoring team, "C4Team", who had 1'000'000 points for all of 25 seconds.)
The scoreboard during the competition can be found here:
Judging from the graph we declare secondary goal #4 accomplished.
Without the penalty to later solves it would have looked like this:
And finally, this is what the scoreboard would have been if we had used CTFd dynamic scoring with parameters max_value=500, min_value=100, decay=20 (except groups of challenges, where max_value and min_value have been scaled accordingly):
With this scoring, four teams in the current top-20 would have been placed below the 20th place.
Our systems rewards solving challenges with few or no other solves much more aggressively than other systems. Whether this incentivized teams to go for unsolved challenges is unclear, but comments on Discord hint at it.
However, we are confident that it limited the impact of large teams of moderately skilled players (aka cheaters) vs. small teams of highly skilled ones.
While we recognize that our scoring system can seem confusing, arbitrary or even unfair, we do think that the top teams earned their place.