Unified Biomedical Knowledge Graph (UBKG)
Source Contexts
The source context for an instance of the UBKG (or UBKG context) describes a collection of sets of assertions, each of which is identified by SAB.
A SAB can correspond to one of the following types of sets of assertions:
- Ontology data, such as that extracted from OWL files published in sources such as the NCBO BioPortal or the OBO Foundry.
- Reference data extracted from public sources such as UniProtKB.
- Result Summarized experimental result data obtained from entities such as NIH-supported laboratories.
UMLS source context (UMLS-Graph)
The UMLS context includes sets of assertions obtained from over 100 vocabularies organized by the UMLS, including
- SNOMEDCT_US
- ICD10
- LOINC
- HGNC
All of these assertions are considered as being from ontologies.
CodeID standardization
In the UBKG, the colon is a reserved character, used to delimit SAB from code in a CodeID property of a Code node.
A small number of vocabularies from the UMLS have default code formats that either include the name of the vocabulary in the code or contain a colon. These vocabularies and their formats include:
SAB | default format |
---|---|
HGNC | HGNC HGNC:x |
GO | GO GO:x |
HPO | HPO HP:x |
HCPCS (Level codes) | HCPCS Level n: Exxx-E-xxxx |
Executing the generation script with the SAB argument of UMLS reformats codes from the UMLS to the format SAB:CODE.
non-UMLS Source Platforms
Sources of data for the UBKG (other than the UMLS base) include:
- OBO Foundry
- NCBO BioPortal
- Sites maintained by stewards, including GitHub repositories, FTP sites, etc.
- Globus (for sources specific to Data Distillery)
- Spreadsheets in the custom SimpleKnowledge Editor format, stored in Google Drive
UBKG base context (UMLS plus)
The UBKG base context appends to the UMLS context assertions from a number of other standard biomedical ontologies and data sources. The UBKG base context includes assertions from the following SABs:
SAB | Description | Type of data | platform |
---|---|---|---|
UMLS | Standarizes format of CodeID | ontlogy | DBMI Neptune |
UBERON | Uber Anatomy Ontology | ontology | OBO Foundry |
PATO | Phenotypic Quality Ontology | ontology | OBO Foundry |
CL | Cell Ontology | ontology | OBO Foundry |
DOID | Human Disease Ontology | ontology | OBO Foundry |
OBI | Ontology for Biomedical Investigations | ontology | OBO Foundry |
OBIB | Ontology for Biobanking | ontology | NCBO BioPortal |
EDAM | EDAM | ontology | EDAM |
HSAPDV | Human Developmental Stages Ontology | ontology | OBO Foundry |
SBO | Systems Biology Ontology | ontology | SBO |
MI | Molecular Interactions | ontology | OBO Foundry |
CHEBI | Chemical Entities of Biological Interest Ontology | ontology | OBO Foundry |
MP | Mammalian Phenotype Ontology | ontology | OBO Foundry |
ORDO | Orphan Rare Disease Ontology | ontology | NCBO BioPortal |
UNIPROTKB | Protein-gene relationships from UniProtKB | reference | UniProt |
UO | Units of Measurement Ontology | ontology | NCBO BioPortal |
MONDO | MONDO Disease Ontology | ontology | OBO Foundry |
EFO | Experimental Factor Ontology | ontology | NCBO BioPortal |
AZ | Azimuth cell annotations mapped to Cell Ontology terms | ontology | Google Drive |
PGO | Pseudogene | ontology | NCBO BioPortal |
GENCODE_ONT | GenCode ontology support (valuesets) | ontology | SimpleKnowledge |
GENCODE | GENCODE annotations and metadata | reference | custom (translation of FTP download) |
RefSeq gene summaries
A separate script adds summaries of genes from RefSeq to the UBKG. This import assumes that GenCode has been ingested.
HuBMAP/SenNet context
The HuBMAP/SenNet context appends to the UBKG base set assertions from the following SABs:
SAB | Description | Type of data | platform |
---|---|---|---|
HRA | Human Reference Atlas | ontology | Globus |
HRAVS | Human Reference Atlas Value Set | ontology | NCBO BioPortal |
HUBMAP | the application ontology supporting the infrastructure of the HuBMAP Consortium | ontology | SimpleKnowledge |
SENNET | the application ontology supporting the infrastructure of the SenNet Consortium | ontology | SimpleKnowledge |
CEDAR | Custom HuBMAP/SenNet metadata templates built from CEDAR templates | reference | Globus |
Data Distillery context
The Data Distillery context appends to the UBKG base set assertions to support participants in Data Distillery, a project of the National Institute of Health’s Common Fund Data Ecosystem (CFDE). SABs include:
Data Coordinating Center / Domain | SAB | Description | Type of data | platform |
---|---|---|---|---|
Data Distillery Support Mappings | CLINVAR | NCBI ClinVar | reference | Globus |
CMAP | Connectivity Map | reference | Globus | |
HPOMP | HPO-MP mapping | reference | Globus | |
HGNCHPO | human genotype - phenotype mapping | reference | Globus | |
HCOPHGNC | human - mouse orthologs | reference | Globus | |
HCOPMP | mouse genotype-phenotype mapping | reference | Globus | |
RATHCOP | ENSEMBL human to ENSEMBL Rat ortholog | reference | Globus | |
MSIGDB | Molecular Signatures Database | reference | Globus | |
HSCLO | Chromosome Location Ontology | ontology | Globus | |
GENCODEHSCLO | GENCODE-HSCLO mapping | reference | Globus | |
4DNucleome | 4DN | 4D Nucleome | result | Globus |
External RNA Controls Consortium (ERCC) | ERCCRBP | exRNA RNA Binding Proteins | result | Globus |
ERCCREG | Regulatory Elements | result | Globus | |
GlyGen | FALDO | Feature Annotation Description Ontology | ontology | NCBO BioPortal |
UNIPROT | Universal Protein Resource | ontology | UniProt FTP | |
GLYCORDF | Glycomics | ontology | GitHub | |
GLYCOCOO | Glycoconjugate | ontology | GitHub | |
GLYCANS | Glycans data | result | Globus | |
PROTEOFORM | Proteoform | result | Globus | |
Genotype Tissue Expression (GTEx) | GTEXCOEXP | Co-expression | result | Globus |
GTEXEQTL | Expression | result | Globus | |
GTEXEXP | Expression quantitative trait loci (eQTL) | result | Globus | |
Human BioMolecular Atlas Program (HuBMAP) | HMAZ | HuBMAP Azimuth Cell Expression Summary | result | Globus |
HMCELL | HuBMAP Cell Expression Summary | result | Globus | |
Illuminating the Druggable Genome (IDG) | IDGP | Compound-protein interactions | result | Globus |
IDGD | Compound-disease interactions | result | Globus | |
Gabriella Miller Kids First | KFPT | result | Globus | |
Library of Integrated Network-Based Cellular Signatures (LINCS) | LINCS | result | Globus | |
Molecular Transducers of Physical Activity Consortium (MoTrPAC) | MOTRPAC | result | Globus | |
Metabolomics Workbench (MW) | MW | Cell-metabolite mappings | result | Globus |
Stimulating Peripheral Activity to Relieve Conditions (SPARC) | NPO | Neuron Phenotype Ontology | ontology | NCBO BioPortal |
NPOSCKAN | NPOSCKAN | ontology | GitHub |