Main Track & Evaluation

Rank candidate–relation pairs across ontology pairs

Each query gives a source entity and a fixed set of 50 target candidates. A system scores every candidate against the 3 relations — equivalent, source_subsumed_by_target, source_subsumes_target — producing 150 scored rows per query.

A prediction is relevant only when it matches the reference on both the target entity and the relation. The right neighbour with the wrong relation does not count.

The full pipeline: five projected ontologies feed one knowledge graph, which feeds the typed-candidate-ranking task evaluated on this page. Counts are from the current build.

The relations

equivalent: The source and target denote the same concept.
source_subsumed_by_target: The source is more specific — a subclass of the target.
source_subsumes_target: The source is more general — a superclass of the target.

A worked example

Ranking DOID candidates for the source concept NCITmelanoma — illustrative, from public data only.

1 DOIDmelanoma equivalent exact

Source and target denote the same disease. Under the preferred typed metric this is the only fully relevant prediction — entity and relation both match the reference — so a strong system ranks it first.
2 DOIDcutaneous melanoma subsumes near

Melanoma is broader than cutaneous melanoma, so the source subsumes this target. Not the exact match, but an ontologically close neighbour — the hierarchy-aware metric awards graded credit by distance.
3 DOIDmelanocytic nevus equivalent miss

A melanocytic nevus is a benign mole — neither equivalent to nor a sub- or super-class of melanoma. A distractor: no credit, and ranking it this high pushes a genuine neighbour down.
4 DOIDskin cancer subsumed by near

Melanoma is a kind of skin cancer, so the source is subsumed by this target. A correct neighbour earning graded credit, ranked below a distractor here.

Illustrative example (NCIT → DOID). The live site renders from the public release; hidden test data is never shown.

Task pairs

The release defines 3 ontology-pair tasks.

The ontology-pair tasks in the release
Task	Source → target	Focus
`NCIT-DOID`	NCIT DOID	Cancer and disease concept alignment.
`SNOMED-FMA`	SNOMED FMA	Clinical anatomy to reference anatomy alignment.
`SNOMED-NCIT`	SNOMED NCIT	Clinical concept to cancer and disease alignment.

ORDO is projected into the unified graph but is the source or target of none of the task pairs above. Its projection prepares the graph for a planned ORDO-OMIM task — aligning Orphanet rare-disease concepts with OMIM (Online Mendelian Inheritance in Man) — whose inclusion in the first competition round is not yet decided. Provisional

Three methodological entry points

The public package supports three entry points, each adding a data layer to the one before — use as much or as little as your method needs.

Level 1 — Triples only: The graph structure and training anchors. Enables: Standard KG-embedding and graph neural network approaches.
Level 2 — Triples + node properties: Adds labels, synonyms, definitions, and metadata. Enables: Text-enhanced KG models and biomedical language models.
Level 3 — Triples + node properties + Datalog: Adds OWL-derived facts and rules. Enables: Symbolic, rule-aware, and neuro-symbolic systems.

Evaluation

Submissions are ranked by 2 primary metrics; secondary metrics add resolution. A diagnostic tier is computed locally by the kit and never feeds the leaderboard.

Primary

Preferred Typed (Relation-Aware) MRR: Mean reciprocal rank of the preferred (target, relation) pair — entity and relation must both match the reference.
Hierarchy-Aware Typed nDCG@10: Graded relevance that rewards ontologically close candidates, decaying with distance in the class hierarchy (gain 1.0 exact, 0.6 same-entity, then 0.6/(d+1)).

Secondary

Preferred Typed Hits@1, Preferred Typed Hits@5, Preferred Typed Hits@10. Primary and secondary metrics are reported per task and aggregated across the tasks.

Diagnostic (kit-local)

Entity-only (untyped) MRR, Entity-only Hits@1, Entity-only Hits@5, Entity-only Hits@10, Relation accuracy on the preferred entity, Per-relation macro-F1 on the preferred entity.

These run only in the kit's local score command, to help you debug a system before submitting. They are never the leaderboard ranking — local diagnostic scores will not match it, and final scoring runs server-side against held-out labels.

Baselines

Shipped with the kit: Runnable baseline code: Random, Hybrid lexical.
Bundled with the release: Organiser predictions shipped alongside the data: Graph embedding (KGE), Sentence embedding (SBERT).
Planned Provisional: Not yet shipped: Rule-aware.

Specifications

The exact contracts live in the participant kit; this page states the headline and links out.

Block scoring: How candidate rows are grouped into per-query blocks and scored.
Pool model: How the fixed candidate pool for each query is constructed.
Metric families: Exact definitions of the typed metrics and how they relate.
Graded relevance: The graded-relevance scheme behind Hierarchy-Aware Typed nDCG@10.
Complex track: The separate OWL class-expression-generation track — its own data, format, and evaluation, documented in the kit.

Leaderboard

Final standings are published on the competition platform Provisional; this site does not host them.