Unified Biomedical Knowledge Graph (UBKG)

Source Contexts

The source context for an instance of the UBKG (or UBKG context) describes a collection of sets of assertions, each of which is identified by SAB.

A SAB can correspond to one of the following types of sets of assertions:

  1. Ontology data, such as that extracted from OWL files published in sources such as the NCBO BioPortal or the OBO Foundry.
  2. Reference data extracted from public sources such as UniProtKB.
  3. Result Summarized experimental result data obtained from entities such as NIH-supported laboratories.

UMLS source context (UMLS-Graph)

The UMLS context includes sets of assertions obtained from over 100 vocabularies organized by the UMLS, including

All of these assertions are considered as being from ontologies.

CodeID standardization

In the UBKG, the colon is a reserved character, used to delimit SAB from code in a CodeID property of a Code node.

A small number of vocabularies from the UMLS have default code formats that either include the name of the vocabulary in the code or contain a colon. These vocabularies and their formats include:

SAB default format
HCPCS (Level codes) HCPCS Level n: Exxx-E-xxxx

Executing the generation script with the SAB argument of UMLS reformats codes from the UMLS to the format SAB:CODE.

non-UMLS Source Platforms

Sources of data for the UBKG (other than the UMLS base) include:

  1. OBO Foundry
  2. NCBO BioPortal
  3. Sites maintained by stewards, including GitHub repositories, FTP sites, etc.
  4. Globus (for sources specific to Data Distillery)
  5. Spreadsheets in the custom SimpleKnowledge Editor format, stored in Google Drive

UBKG base context (UMLS plus)

The UBKG base context appends to the UMLS context assertions from a number of other standard biomedical ontologies and data sources. The UBKG base context includes assertions from the following SABs:

SAB Description Type of data platform
UMLS Standarizes format of CodeID ontlogy DBMI Neptune
UBERON Uber Anatomy Ontology ontology OBO Foundry
PATO Phenotypic Quality Ontology ontology OBO Foundry
CL Cell Ontology ontology OBO Foundry
DOID Human Disease Ontology ontology OBO Foundry
OBI Ontology for Biomedical Investigations ontology OBO Foundry
OBIB Ontology for Biobanking ontology NCBO BioPortal
HSAPDV Human Developmental Stages Ontology ontology OBO Foundry
SBO Systems Biology Ontology ontology SBO
MI Molecular Interactions ontology OBO Foundry
CHEBI Chemical Entities of Biological Interest Ontology ontology OBO Foundry
MP Mammalian Phenotype Ontology ontology OBO Foundry
ORDO Orphan Rare Disease Ontology ontology NCBO BioPortal
UNIPROTKB Protein-gene relationships from UniProtKB reference UniProt
UO Units of Measurement Ontology ontology NCBO BioPortal
MONDO MONDO Disease Ontology ontology OBO Foundry
EFO Experimental Factor Ontology ontology NCBO BioPortal
AZ Azimuth cell annotations mapped to Cell Ontology terms ontology Google Drive
PGO Pseudogene ontology NCBO BioPortal
GENCODE_ONT GenCode ontology support (valuesets) ontology SimpleKnowledge
GENCODE GENCODE annotations and metadata reference custom (translation of FTP download)

RefSeq gene summaries

A separate script adds summaries of genes from RefSeq to the UBKG. This import assumes that GenCode has been ingested.

HuBMAP/SenNet context

The HuBMAP/SenNet context appends to the UBKG base set assertions from the following SABs:

SAB Description Type of data platform
HRA Human Reference Atlas ontology Globus
HRAVS Human Reference Atlas Value Set ontology NCBO BioPortal
HUBMAP the application ontology supporting the infrastructure of the HuBMAP Consortium ontology SimpleKnowledge
SENNET the application ontology supporting the infrastructure of the SenNet Consortium ontology SimpleKnowledge
CEDAR Custom HuBMAP/SenNet metadata templates built from CEDAR templates reference Globus

Data Distillery context

The Data Distillery context appends to the UBKG base set assertions to support participants in Data Distillery, a project of the National Institute of Health’s Common Fund Data Ecosystem (CFDE). SABs include:

Data Coordinating Center / Domain SAB Description Type of data platform
Data Distillery Support Mappings CLINVAR NCBI ClinVar reference Globus
  CMAP Connectivity Map reference Globus
  HPOMP HPO-MP mapping reference Globus
  HGNCHPO human genotype - phenotype mapping reference Globus
  HCOPHGNC human - mouse orthologs reference Globus
  HCOPMP mouse genotype-phenotype mapping reference Globus
  RATHCOP ENSEMBL human to ENSEMBL Rat ortholog reference Globus
  MSIGDB Molecular Signatures Database reference Globus
  HSCLO Chromosome Location Ontology ontology Globus
  GENCODEHSCLO GENCODE-HSCLO mapping reference Globus
4DNucleome 4DN 4D Nucleome result Globus
External RNA Controls Consortium (ERCC) ERCCRBP exRNA RNA Binding Proteins result Globus
  ERCCREG Regulatory Elements result Globus
GlyGen FALDO Feature Annotation Description Ontology ontology NCBO BioPortal
  UNIPROT Universal Protein Resource ontology UniProt FTP
  GLYCORDF Glycomics ontology GitHub
  GLYCOCOO Glycoconjugate ontology GitHub
  GLYCANS Glycans data result Globus
  PROTEOFORM Proteoform result Globus
Genotype Tissue Expression (GTEx) GTEXCOEXP Co-expression result Globus
  GTEXEQTL Expression result Globus
  GTEXEXP Expression quantitative trait loci (eQTL) result Globus
Human BioMolecular Atlas Program (HuBMAP) HMAZ HuBMAP Azimuth Cell Expression Summary result Globus
  HMCELL HuBMAP Cell Expression Summary result Globus
Illuminating the Druggable Genome (IDG) IDGP Compound-protein interactions result Globus
  IDGD Compound-disease interactions result Globus
Gabriella Miller Kids First KFPT   result Globus
Library of Integrated Network-Based Cellular Signatures (LINCS) LINCS   result Globus
Molecular Transducers of Physical Activity Consortium (MoTrPAC) MOTRPAC   result Globus
Metabolomics Workbench (MW) MW Cell-metabolite mappings result Globus
Stimulating Peripheral Activity to Relieve Conditions (SPARC) NPO Neuron Phenotype Ontology ontology NCBO BioPortal