Pre-launch — the competition is being finalised and details on this site may change.

BioKG-Align
Participate

From download to leaderboard

Get the public data and kit, build a system, validate locally, then submit. Submissions are a four-column TSV — SrcEntity, TgtEntity, Relation, Score — block-structured with no QueryID.

How to take part

Canvas 1 Layer 1 Text Get Started Text Download Data Text Develop Text Validate Text Submit SvgjsG1023 _x30_7 Train systems using publicly released triples Text Tune Locally with the public validation split (triples) Via codabench; publish results to the leaderboard by cloning the public participation repository Obtain the publicly released training and validation data
The participant journey at a glance — the figure's first two steps, Get Started and Download Data, combine into the first step below. The numbered steps are authoritative.
  1. 1 Get started & download

    Get the public BioKG-Align kit and the datasets.

  2. 2 Build

    Construct a system and train it on the publicly released training and validation data.

  3. 3 Validate

    Tune your system, then validate your prediction file with the kit's submission validator — a green light means it is ready to submit.

  4. 4 Submit

    Submit your predictions to the platform for server-side scoring.

Get the kit from the GitHub repository; the dataset will be published as a Hugging Face release Provisional, described on the Data page. Evaluation is planned to run on CodaBench Provisional, so submitting needs a CodaBench account. The kit's build guide walks through the workflow hands-on.

Submission format

A single four-column TSV — SrcEntity, TgtEntity, Relation, Score — with one positional block of 150 rows per query (50 candidates × 3 relations) and no QueryID. The Relation column takes one of equivalent, source_subsumed_by_target, source_subsumes_target.

The kit owns the exact contract — see submission format and block scoring. Final scoring runs server-side; the kit's local score is a diagnostic aid only — see the task and evaluation page.

Tracks

Main track — open
Typed link prediction — ranking (candidate, relation) pairs over a fixed candidate set.
Complex track — restricted
OWL class-expression generation — reconstructing the logical definitions of the HP, MP, and WBP phenotype ontologies over background OBO ontologies. It has its own data, submission format, and evaluation; see the complex-track guide.

The complex track has its own data, format, and evaluation — see the complex-track guide.

Rules & ethics

The main track is open
Use any systems and any data to produce your ranking. We only ask that your submission contains the expected output — scores for the provided candidates, in the block-structured TSV.
Results must be reproducible
We expect to be able to reproduce participant results.
External resources
External resources and pretrained models may be used if they are public, cited, and declared.
Code and reproduction
Final leaderboard entries may be asked for code and reproduction steps.

Models trained on this benchmark are research prototypes and must not be used for clinical decision-making without independent expert validation.