Participate

From download to leaderboard

Get the public data and kit, build a system, validate locally, then submit. Submissions are a four-column TSV — SrcEntity, TgtEntity, Relation, Score — block-structured with no QueryID.

How to take part

The participant journey at a glance — the figure's first two steps, Get Started and Download Data, combine into the first step below. The numbered steps are authoritative.

1 Get started & download

Get the public BioKG-Align kit and the datasets.
2 Build

Construct a system and train it on the publicly released training and validation data.
3 Validate

Tune your system, then validate your prediction file with the kit's submission validator — a green light means it is ready to submit.
4 Submit

Submit your predictions to the platform for server-side scoring.

Get the kit from the GitHub repository; the dataset will be published as a Hugging Face release Provisional, described on the Data page. Evaluation is planned to run on CodaBench Provisional, so submitting needs a CodaBench account. The kit's build guide walks through the workflow hands-on.

Submission format

A single four-column TSV — SrcEntity, TgtEntity, Relation, Score — with one positional block of 150 rows per query (50 candidates × 3 relations) and no QueryID. The Relation column takes one of equivalent, source_subsumed_by_target, source_subsumes_target.

The kit owns the exact contract — see submission format and block scoring. Final scoring runs server-side; the kit's local score is a diagnostic aid only — see the task and evaluation page.

Tracks

Main track — open: Typed link prediction — ranking (candidate, relation) pairs over a fixed candidate set.
Complex track — restricted: OWL class-expression generation — reconstructing the logical definitions of the HP, MP, and WBP phenotype ontologies over background OBO ontologies. It has its own data, submission format, and evaluation; see the complex-track guide.

The complex track has its own data, format, and evaluation — see the complex-track guide.

Rules & ethics

The main track is open: Use any systems and any data to produce your ranking. We only ask that your submission contains the expected output — scores for the provided candidates, in the block-structured TSV.
Results must be reproducible: We expect to be able to reproduce participant results.
External resources: External resources and pretrained models may be used if they are public, cited, and declared.
Code and reproduction: Final leaderboard entries may be asked for code and reproduction steps.

Models trained on this benchmark are research prototypes and must not be used for clinical decision-making without independent expert validation.