Pre-launch — the competition is being finalised and details on this site may change.

BioKG-Align
Main Track & Evaluation

Rank candidate–relation pairs across ontology pairs

Each query gives a source entity and a fixed set of 50 target candidates. A system scores every candidate against the 3 relations — equivalent, source_subsumed_by_target, source_subsumes_target — producing 150 scored rows per query.

A prediction is relevant only when it matches the reference on both the target entity and the relation. The right neighbour with the wrong relation does not count.

Canvas 1 Layer 1 SNOMED CT SOURCE ONTOLOGIES NCIT FMA ORDO DOID Text Unified Biomedical Text Knowledge Graph Text OWL2Vec*-style projection Text with ELK reasoning Text 824,035 nodes 2,781,910 triples Text Triples Text Properties Text Datalog P R O J E C T Main Track Typed Candidate Ranking Complex Track OWL Expression Generation COMPETITION TRACKS
The full pipeline: five projected ontologies feed one knowledge graph, which feeds the typed-candidate-ranking task evaluated on this page. Counts are from the current build.

The relations

equivalent
The source and target denote the same concept.
source_subsumed_by_target
The source is more specific — a subclass of the target.
source_subsumes_target
The source is more general — a superclass of the target.

A worked example

Ranking DOID candidates for the source concept NCITmelanoma — illustrative, from public data only.

  1. 1 DOIDmelanoma equivalent exact

    Source and target denote the same disease. Under the preferred typed metric this is the only fully relevant prediction — entity and relation both match the reference — so a strong system ranks it first.

  2. 2 DOIDcutaneous melanoma subsumes near

    Melanoma is broader than cutaneous melanoma, so the source subsumes this target. Not the exact match, but an ontologically close neighbour — the hierarchy-aware metric awards graded credit by distance.

  3. 3 DOIDmelanocytic nevus equivalent miss

    A melanocytic nevus is a benign mole — neither equivalent to nor a sub- or super-class of melanoma. A distractor: no credit, and ranking it this high pushes a genuine neighbour down.

  4. 4 DOIDskin cancer subsumed by near

    Melanoma is a kind of skin cancer, so the source is subsumed by this target. A correct neighbour earning graded credit, ranked below a distractor here.

Illustrative example (NCIT → DOID). The live site renders from the public release; hidden test data is never shown.

Task pairs

The release defines 3 ontology-pair tasks.

The ontology-pair tasks in the release
Task Source → target Focus
NCIT-DOID NCIT DOID Cancer and disease concept alignment.
SNOMED-FMA SNOMED FMA Clinical anatomy to reference anatomy alignment.
SNOMED-NCIT SNOMED NCIT Clinical concept to cancer and disease alignment.

ORDO is projected into the unified graph but is the source or target of none of the task pairs above. Its projection prepares the graph for a planned ORDO-OMIM task — aligning Orphanet rare-disease concepts with OMIM (Online Mendelian Inheritance in Man) — whose inclusion in the first competition round is not yet decided. Provisional

Three methodological entry points

The public package supports three entry points, each adding a data layer to the one before — use as much or as little as your method needs.

Level 1 — Triples only
The graph structure and training anchors. Enables: Standard KG-embedding and graph neural network approaches.
Level 2 — Triples + node properties
Adds labels, synonyms, definitions, and metadata. Enables: Text-enhanced KG models and biomedical language models.
Level 3 — Triples + node properties + Datalog
Adds OWL-derived facts and rules. Enables: Symbolic, rule-aware, and neuro-symbolic systems.

Evaluation

Submissions are ranked by 2 primary metrics; secondary metrics add resolution. A diagnostic tier is computed locally by the kit and never feeds the leaderboard.

Primary

Preferred Typed (Relation-Aware) MRR
Mean reciprocal rank of the preferred (target, relation) pair — entity and relation must both match the reference.
Hierarchy-Aware Typed nDCG@10
Graded relevance that rewards ontologically close candidates, decaying with distance in the class hierarchy (gain 1.0 exact, 0.6 same-entity, then 0.6/(d+1)).

Secondary

Preferred Typed Hits@1, Preferred Typed Hits@5, Preferred Typed Hits@10. Primary and secondary metrics are reported per task and aggregated across the tasks.

Diagnostic (kit-local)

Entity-only (untyped) MRR, Entity-only Hits@1, Entity-only Hits@5, Entity-only Hits@10, Relation accuracy on the preferred entity, Per-relation macro-F1 on the preferred entity.

These run only in the kit's local score command, to help you debug a system before submitting. They are never the leaderboard ranking — local diagnostic scores will not match it, and final scoring runs server-side against held-out labels.

Baselines

Shipped with the kit
Runnable baseline code: Random, Hybrid lexical.
Bundled with the release
Organiser predictions shipped alongside the data: Graph embedding (KGE), Sentence embedding (SBERT).
Planned Provisional
Not yet shipped: Rule-aware.

Specifications

The exact contracts live in the participant kit; this page states the headline and links out.

Block scoring
How candidate rows are grouped into per-query blocks and scored.
Pool model
How the fixed candidate pool for each query is constructed.
Metric families
Exact definitions of the typed metrics and how they relate.
Graded relevance
The graded-relevance scheme behind Hierarchy-Aware Typed nDCG@10.
Complex track
The separate OWL class-expression-generation track — its own data, format, and evaluation, documented in the kit.

Leaderboard

Final standings are published on the competition platform Provisional; this site does not host them.