Public graph package, hidden-label evaluation
Five ontologies — NCIT, DOID, ORDO, SNOMED, FMA — are projected into one knowledge graph. Public train (60%) and validation (10%) carry gold labels; the private test split (30%) releases candidates with gold withheld.
What's in the package
| Path | Purpose |
|---|---|
graph/triples.csv |
Typed intra-ontology graph triples and the released training anchors. |
graph/properties.csv |
Node identifiers, ontology membership, labels, synonyms, definitions, and permitted metadata. |
graph/rules.dl |
Selected Datalog-compatible rules derived from ontology axioms. |
alignments/{train,valid}.tsv |
Public supervised labels with their relation types. |
tasks/*/*.cands.tsv |
Candidate files for train, validation, and test queries. |
tasks/*/{train,valid}.preferred.tsv |
One preferred (target, relation) gold per query; drives the Preferred Typed (Relation-Aware) MRR. |
tasks/*/{train,valid}.graded.tsv |
Graded relevance over (candidate, relation) pairs, from the preferred pairs and the target-ontology hierarchy; drives Hierarchy-Aware Typed nDCG@10. |
evaluation/* |
Sample submission, submission schema, and the local scorer. Kit scoring available; CodaBench integration in preparation. Provisional |
baseline_predictions/* |
Organiser baseline-system predictions on the test split. |
data_card.md, *_manifest.json, reports/ |
The data card, licence and release manifests, and dataset reports. |
Released as CSV, TSV, Datalog, JSON. The kit's data card is the authoritative inventory and file-format reference.
Ontology licences
| Ontology | Licence |
|---|---|
| NCIT | CC BY 4.0 |
| DOID | CC0 1.0 |
| ORDO | CC BY 4.0 |
| FMA | FMA License (redistribution terms apply) |
| SNOMED | SNOMED CT Affiliate License (redistribution terms apply) |
Hosting & distribution
The dataset is distributed as a standalone public artefact; the hosting platform and DOI are not yet finalised, and an offline archive is planned for Zenodo distribution. Provisional