Rank candidate–relation pairs across ontology pairs
Each query gives a source entity and a fixed set of 50
target candidates. A system scores every candidate against the 3 relations
— equivalent, source_subsumed_by_target, source_subsumes_target —
producing 150 scored rows per query.
A prediction is relevant only when it matches the reference on both the target entity and the relation. The right neighbour with the wrong relation does not count.
The relations
- equivalent
- The source and target denote the same concept.
- source_subsumed_by_target
- The source is more specific — a subclass of the target.
- source_subsumes_target
- The source is more general — a superclass of the target.
A worked example
Ranking DOID candidates for the source concept NCITmelanoma — illustrative, from public data only.
-
1 DOIDmelanoma equivalent exact
Source and target denote the same disease. Under the preferred typed metric this is the only fully relevant prediction — entity and relation both match the reference — so a strong system ranks it first.
-
2 DOIDcutaneous melanoma subsumes near
Melanoma is broader than cutaneous melanoma, so the source subsumes this target. Not the exact match, but an ontologically close neighbour — the hierarchy-aware metric awards graded credit by distance.
-
3 DOIDmelanocytic nevus equivalent miss
A melanocytic nevus is a benign mole — neither equivalent to nor a sub- or super-class of melanoma. A distractor: no credit, and ranking it this high pushes a genuine neighbour down.
-
4 DOIDskin cancer subsumed by near
Melanoma is a kind of skin cancer, so the source is subsumed by this target. A correct neighbour earning graded credit, ranked below a distractor here.
Illustrative example (NCIT → DOID). The live site renders from the public release; hidden test data is never shown.
Task pairs
The release defines 3 ontology-pair tasks.
| Task | Source → target | Focus |
|---|---|---|
NCIT-DOID |
NCIT DOID | Cancer and disease concept alignment. |
SNOMED-FMA |
SNOMED FMA | Clinical anatomy to reference anatomy alignment. |
SNOMED-NCIT |
SNOMED NCIT | Clinical concept to cancer and disease alignment. |
ORDO is projected into the unified graph but is the source or target of none of the task pairs above. Its projection prepares the graph for a planned ORDO-OMIM task — aligning Orphanet rare-disease concepts with OMIM (Online Mendelian Inheritance in Man) — whose inclusion in the first competition round is not yet decided. Provisional
Three methodological entry points
The public package supports three entry points, each adding a data layer to the one before — use as much or as little as your method needs.
- Level 1 — Triples only
- The graph structure and training anchors. Enables: Standard KG-embedding and graph neural network approaches.
- Level 2 — Triples + node properties
- Adds labels, synonyms, definitions, and metadata. Enables: Text-enhanced KG models and biomedical language models.
- Level 3 — Triples + node properties + Datalog
- Adds OWL-derived facts and rules. Enables: Symbolic, rule-aware, and neuro-symbolic systems.
Evaluation
Submissions are ranked by 2 primary metrics; secondary metrics add resolution. A diagnostic tier is computed locally by the kit and never feeds the leaderboard.
Primary
- Preferred Typed (Relation-Aware) MRR
- Mean reciprocal rank of the preferred (target, relation) pair — entity and relation must both match the reference.
- Hierarchy-Aware Typed nDCG@10
- Graded relevance that rewards ontologically close candidates, decaying with distance in the class hierarchy (gain 1.0 exact, 0.6 same-entity, then 0.6/(d+1)).
Secondary
Preferred Typed Hits@1, Preferred Typed Hits@5, Preferred Typed Hits@10. Primary and secondary metrics are reported per task and aggregated across the tasks.
Diagnostic (kit-local)
Entity-only (untyped) MRR, Entity-only Hits@1, Entity-only Hits@5, Entity-only Hits@10, Relation accuracy on the preferred entity, Per-relation macro-F1 on the preferred entity.
These run only in the kit's local score command, to help you debug a
system before submitting. They are never the leaderboard ranking — local
diagnostic scores will not match it, and final scoring runs server-side against
held-out labels.
Baselines
- Shipped with the kit
- Runnable baseline code: Random, Hybrid lexical.
- Bundled with the release
- Organiser predictions shipped alongside the data: Graph embedding (KGE), Sentence embedding (SBERT).
- Planned Provisional
- Not yet shipped: Rule-aware.
Specifications
The exact contracts live in the participant kit; this page states the headline and links out.
- Block scoring
- How candidate rows are grouped into per-query blocks and scored.
- Pool model
- How the fixed candidate pool for each query is constructed.
- Metric families
- Exact definitions of the typed metrics and how they relate.
- Graded relevance
- The graded-relevance scheme behind Hierarchy-Aware Typed nDCG@10.
- Complex track
- The separate OWL class-expression-generation track — its own data, format, and evaluation, documented in the kit.
Leaderboard
Final standings are published on the competition platform Provisional; this site does not host them.