Input
Each query provides a source entity, a target ontology, and a fixed list of target candidates. The current scaffold uses exactly 30 candidates per query.
Pre-launch NeurIPS challenge draft
BioKG-Align reframes biomedical ontology matching as relation-aware candidate ranking over knowledge graphs. Systems receive a source entity and a fixed set of target candidates, then rank candidate-relation pairs for equivalence or directional subsumption.
Overview
Biomedical knowledge is distributed across independently developed ontologies and controlled vocabularies. Aligning these resources supports search, data integration, clinical research, machine learning, and auditable knowledge base construction.
The challenge exposes ontology alignment through a familiar machine learning interface: typed link prediction over fixed candidate sets. Participants may use graph triples, node properties, and optional Datalog rules derived from ontology structure.
Task
Each query provides a source entity, a target ontology, and a fixed list of target candidates. The current scaffold uses exactly 30 candidates per query.
Submissions score candidate-relation pairs using the official relation names:
equivalent, source_subsumed_by_target, and
source_subsumes_target.
A prediction is relevant only when both the target entity and the relation type match the hidden reference alignment.
| Task | Source | Target | Current role |
|---|---|---|---|
NCIT-DOID |
NCIT | DOID | Disease and cancer concept alignment |
OMIM-ORDO |
OMIM | ORDO | Rare disease and inherited disorder alignment |
SNOMED-FMA |
SNOMED CT | FMA | Clinical anatomy to anatomy alignment |
SNOMED-NCIT |
SNOMED CT | NCIT | Clinical concept to cancer and disease alignment |
Data
| Path | Purpose | Status |
|---|---|---|
graph/triples.csv |
Typed intra-ontology graph triples and released training anchors. | Format documented; official release TBD. |
graph/properties.csv |
Node identifiers, ontology membership, labels, synonyms, definitions, and permitted metadata. | Format documented; license-cleared release TBD. |
graph/rules.dl |
Selected Datalog-compatible rules derived from ontology axioms. | Format documented; final rule projection TBD. |
alignments/train.tsv, alignments/valid.tsv |
Public supervised labels with relation types. | Format documented; final splits TBD. |
tasks/*/*.cands.tsv |
Candidate files for train, validation, and test queries. | Format documented; official candidate sets TBD. |
evaluation/* |
Sample submission, schema, and local scorer interface. | Kit command available; CodaBench integration in preparation. |
Participants receive graph files, public train and validation labels, public candidate sets, baseline code, documentation, manifests, and a local validation scorer.
Hidden test answers, full hidden references, split construction details, licensed raw sources, and official hidden scorer assets are not part of this public kit.
Evaluation
The primary leaderboard metric is planned to be macro-averaged relation-aware nDCG@10 across ontology-pair tasks. A ranked item is relevant only if the target entity and relation type are both correct.
Secondary metrics from the current scaffold include MRR, Hits@1, Hits@5, Hits@10, MAP, and macro-F1 over relation types. Confidence intervals and significance testing are TBD.
Baselines
Ranks candidates using preferred labels, synonyms, and string similarity.
Combines lexical similarity with simple relation priors from the current scaffold.
TBD Planned baseline over graph triples and training anchors.
TBD Planned baseline using labels, synonyms, and definitions.
TBD Planned baseline demonstrating use of Datalog-derived features.
Protocol
| Column | Description |
|---|---|
SrcEntity |
Source entity identifier from the query. |
TgtEntity |
One target entity from the provided candidate list. |
Relation |
One of the official relation names. |
Score |
Numeric confidence score used for ranking. |
Rules
Final rules, external-resource policy, team limits, submission limits, and reproduction requirements are TBD.
Schedule
| Phase | Target | Readiness |
|---|---|---|
| Dataset and license finalization | TBD | Ontology versions, redistribution plan, and source permissions under review. |
| Starting kit release | TBD | Fixture pipeline and scorer scaffold exist; real baselines pending. |
| Beta test | TBD | Planned with ontology matching, KG learning, and neuro-symbolic reviewers. |
| Competition launch | TBD | Depends on NeurIPS decision, data readiness, and platform setup. |
| Final submission and reproducibility checks | TBD | Policy and compute resources to be finalized. |
FAQ
Not yet. This repository contains a small runnable example and participant tools. Final ontology versions, counts, and download links are TBD.
No. The challenge interface is graph-learning-ready: triples, node properties, Datalog rules, and candidate TSV files. Ontology-specific background will be documented for participants who want it.
The draft plan allows public, cited, and declared external resources. The final policy is TBD and will be published before launch.
Hidden test labels are not included in public candidate files, graph triples, released alignments, or Datalog rules. The official scorer will keep hidden answers server-side.
Organizers
The organizing team will combine expertise in ontology matching, biomedical ontologies, knowledge graph alignment, machine learning, neuro-symbolic reasoning, software engineering, and benchmark evaluation.
Organizer names, affiliations, dedicated contact email, sponsors, awards, and platform administrators are TBD.