Evaluation & Ranking - The LEOPARD Challenge

📢 The LEOPARD challenge has completed. Participants of the LEOPARD challenge, as well as all non-participating researchers using the LEOPARD public training dataset, must adhere to the publication embargo period and can publish their own results, separately only after the completion of the embargo period (after the publication of the LEOPARD challenge journal paper and the publicatication of the LEOPARD challenge baseline journal paper). While doing so, they are requested to cite the challenge publication.

Evaluation & Ranking 📊¶

Performance metric¶

Performance is evaluated according to the censored concordance index. The concordance index is defined as the proportion of all comparable pairs in which the predictions and outcomes are concordant. Two patients are comparable if:¶

both of them experienced an event (at different times), or
the one with a shorter observed survival time experienced an event, in which case the event-free patient “outlived” the other

A pair is not comparable if the patients experienced events at the same time. Concordance intuitively means that two samples were ordered correctly by the model: two patients are concordant if the one with a higher estimated risk score has a shorter actual survival time.¶

The concordance index ranges from 0.5 (random prediction) to 1 (perfect concordance).¶

Submission format¶

This is a code execution challenge. Rather than submitting your predicted labels, you'll package everything needed to do inference and submit that for containerized execution. More details on the submission format to be announced soon.¶

Sanity check¶

To help participants quickly identify any potential issues with submission format, we're providing a small dataset for sanity checking. This datatest set consists of only 3 patients and is designed to be easily processed by your containerised algorithm. The goal is not to evaluate performance but rather ensure that inference runs successfully without errors. By checking the sanity check leaderboard, you can quickly confirm that your submission is correct and avoid any potential issues during the validation or testing phases.¶

Validation phase¶

The validation phase is scheduled to open on June 1st, 2024 and will remain active until August 1st, 2024. During this period, participants are invited to submit their algorithm on the hidden validation set (~ 100 patients). If there are no errors and inference completes successfully, your score will be up on the validation leaderboard (typically in less than 24 hours). Please double-check all rules to make sure that your submission is compliant. Invalid submissions will be removed and teams repeatedly violating any/multiple rules will be disqualified.¶

Testing phase¶

The testing phase is scheduled to open on July 1st, 2024 and will remain active until August 1st, 2024. During this period, participants are invited to submit their best-performing algorithm in containerised format, which will be executed on our hidden test set (~ 700 patients). The resulting score from will be used to populate the testing leaderboard, offering insight into the relative strengths and weaknesses of different solutions.¶