Data🛢️


This challenge data includes Whole Slide Images (WSIs) along with follow-up metadata. Your task is to estimate the time to biochemical recurrence (in years, as a continuous variable e.g. 1.23 years or 17.68 years).

Training set

Images
The training set consists of 508 cases from Radboudumc, each corresponding to a unique patient. Each case is a WSI of prostatectomy containing cancer in the form of pyramidal TIF, which is a multi-resolution, tiled format. Each resolution is stored as a separate level within the TIF. The first level contains the image at full resolution. Each subsequent level is the previous level downsampled by a factor of four. Most WSIs consist of multiple slides packed together (see the 4 examples in the figure below)
The training data occupies a total of 2 TB of storage. To download the dataset proceed to the Data download page. You need to click "Join" the challenge (green button in the top right) to be able to access the data download page.

Labels
Alongside the WSIs, you are provided with a training_labels.csv file that contains follow up information for the patients:
  • case_id (str) - unique identifier for each patient
  • event (int) -  whether the patient had biochemical recurrence (0 = no, 1 = yes)
  • follow_up_years (float)time to biochemical recurrence (event = 1) or time the last follow-up (event = 0) in years


Validation set

The validation set consists of approximately 150 patients. Validation images will only be accessible in the runtime container. You can submit your predictions on the validation set from June 1st, 2024 onwards.


Testing set

The testing set comprises around 650 patients. Testing images will also only be accessible within in the runtime container. This larger evaluation set will provide a comprehensive assessment of your model's performance in predicting time to biochemical recurrence on unseen data. Submissions to the testing set will be open from July 1st, 2024.


License

The data is released under a permissive CC-BY-NC-SA license. The entities using the data must adhere to a publication embargo: it is strictly prohibited to publish outcomes of studies including the LEOPARD challenge data before the publication of the LEOPARD challenge journal paper and LEOPARD challenge baseline journal paper.