Testing phase📊

Manuscript

To participate in the testing phase, the participants must attach a PDF of a manuscript describing their methodology to their submission on grand-challenge.org . The requirements for the manuscript are the following: 

The manuscript should include the following sections:

  • INTRODUCTION. Very briefly (1-2 paragraphs) describe a problem and motivation for the approach.
  • MATERIALS. The participants must indicate if any external data or pre-trained weights were used. There is no need to describe the LEOPARD dataset, however, if there are any data-related details or techniques you used -  this is the section to describe them, e.g. tuning data split, using additional public data, using pre-trained weights ...  
  • METHODS. Describe in detail the steps of your methodology: pre-processing, data filtering/denoising, model/s, model/s training strategy, loss, model/s tuning strategy, experiments... In case any techniques for improving domain generalization were used, be sure to mention those.
  • RESULTS. Describe your experiments results on your own data split,  validation leaderboard, etc...
  • INTERPRETABILITY.  Describe if your method provides any interpretability of the scores.
  • DISCUSSION. Summarize and discuss your findings, limitations, and areas of future work.
  • CODE LINK. Include a link to public/private URL to your source code on GitHub.
  • Include the email address/s of a corresponding author/s.

Submission

The submission process will be identical to the one in the Validation Phase.  

  • DATA. Altogether, the testing dataset consists of ~800 whole slide images. The data in the Testing Phase comes from two sources: Radboud UMC (also the source of training data) and an anonymous external center. The tissue masks are available for each case in the testing set during the submission.  There are cases in testing data that have artifacts.  The artifacts (e.g. pen marks) have been excluded from testing tissue masks. It is up to participants to decide on whether to use the provided tissue masks or not.  The format, resolution, and magnification of testing slides are identical to training ones. 
  • EVALUATION. The LEOPARD challenge uses a censored C-index as a main challenge metric. In the Testing Phase, the C-index is first computed on each dataset center separately and then averaged across centers. The participants can see their performance on each of the centers by clicking "Show all metrics" on the leaderboard: Radboud UMC as well as an anonymous external center.