From download to leaderboard
Get the public data and kit, build a system, validate locally, then submit.
Submissions are a four-column TSV —
SrcEntity, TgtEntity, Relation, Score —
block-structured with no QueryID.
How to take part
-
1 Get started & download
Get the public BioKG-Align kit and the datasets.
-
2 Build
Construct a system and train it on the publicly released training and validation data.
-
3 Validate
Tune your system, then validate your prediction file with the kit's submission validator — a green light means it is ready to submit.
-
4 Submit
Submit your predictions to the platform for server-side scoring.
Get the kit from the GitHub repository; the dataset will be published as a Hugging Face release Provisional, described on the Data page. Evaluation is planned to run on CodaBench Provisional, so submitting needs a CodaBench account. The kit's build guide walks through the workflow hands-on.
Submission format
A single four-column TSV — SrcEntity, TgtEntity, Relation, Score —
with one positional block of 150 rows per query
(50 candidates × 3 relations) and no QueryID.
The Relation column takes one of
equivalent, source_subsumed_by_target, source_subsumes_target.
The kit owns the exact contract — see submission format and block scoring. Final scoring runs server-side; the kit's local score is a diagnostic aid only — see the task and evaluation page.
Tracks
- Main track — open
- Typed link prediction — ranking (candidate, relation) pairs over a fixed candidate set.
- Complex track — restricted
- OWL class-expression generation — reconstructing the logical definitions of the HP, MP, and WBP phenotype ontologies over background OBO ontologies. It has its own data, submission format, and evaluation; see the complex-track guide.
The complex track has its own data, format, and evaluation — see the complex-track guide.
Rules & ethics
- The main track is open
- Use any systems and any data to produce your ranking. We only ask that your submission contains the expected output — scores for the provided candidates, in the block-structured TSV.
- Results must be reproducible
- We expect to be able to reproduce participant results.
- External resources
- External resources and pretrained models may be used if they are public, cited, and declared.
- Code and reproduction
- Final leaderboard entries may be asked for code and reproduction steps.
Models trained on this benchmark are research prototypes and must not be used for clinical decision-making without independent expert validation.