Data - The LEOPARD Challenge

📢 The LEOPARD challenge has completed. Participants of the LEOPARD challenge, as well as all non-participating researchers using the LEOPARD public training dataset, must adhere to the publication embargo period and can publish their own results, separately only after the completion of the embargo period (after the publication of the LEOPARD challenge journal paper and the publicatication of the LEOPARD challenge baseline journal paper). While doing so, they are requested to cite the challenge publication.

Data🛢️¶

This challenge data includes Whole Slide Images (WSIs) along with follow-up metadata. Your task is to estimate the time to biochemical recurrence (in years, as a continuous variable e.g. 1.23 years or 17.68 years).¶

Training set¶

Images¶

The training set consists of 508 cases from Radboudumc, each corresponding to a unique patient. Each case is a WSI of prostatectomy containing cancer in the form of pyramidal TIF, which is a multi-resolution, tiled format. Each resolution is stored as a separate level within the TIF. The first level contains the image at full resolution. Each subsequent level is the previous level downsampled by a factor of four. Most WSIs consist of multiple slides packed together (see the 4 examples in the figure below).¶

The training data occupies a total of 2 TB of storage. To download the dataset proceed to the Data download page. You need to click "Join" the challenge (green button in the top right) to be able to access the data download page.¶

Labels¶

Alongside the WSIs, you are provided with a training_labels.csv file that contains follow up information for the patients:¶

case_id (str) - unique identifier for each patient
event (int) - whether the patient had biochemical recurrence (0 = no, 1 = yes)
follow_up_years (float)- time to biochemical recurrence (event = 1) or time the last follow-up (event = 0) in years

Validation set¶

The validation set consists of 99 patients. Validation images will only be accessible in the runtime container. You can submit your predictions on the validation set from July 3rd, 2024 onwards.¶

Testing set¶

The testing set comprises 824 patients. Testing images will also only be accessible in the runtime container. This larger evaluation set will provide a comprehensive assessment of your model's performance in predicting time to biochemical recurrence on unseen data. Submissions to the testing set will be open from July 14th, 2024.¶

License¶

The data is released under a permissive CC-BY-NC-SA license. The entities using the data must adhere to a publication embargo: it is strictly prohibited to publish outcomes of studies including the LEOPARD challenge data before the publication of the LEOPARD challenge journal paper and LEOPARD challenge baseline journal paper.¶