Unified Biomedical Knowledge Graph (UBKG)

Source Contexts


The source context for an instance of the UBKG (or UBKG context) describes a collection of sets of assertions, each of which is identified by SAB.

A SAB can correspond to one of the following types of sets of assertions:

  1. Ontology data, such as that extracted from OWL files published in sources such as the NCBO BioPortal or the OBO Foundry.
  2. Reference data extracted from public sources such as UniProtKB.
  3. Result Summarized experimental result data obtained from entities such as NIH-supported laboratories.

Citations

We have sought to identify citations for the sources used in UBKG contexts. The preferred form of citation is a PubMed ID.

The sources for citation references are included. Sources are of the following types:

  1. OBO home page
  2. Web pages of source stewards
  3. GitHub repositories of source stewards

Licensing

We have sought to identify licensing under which sources are distributed. The majority of the sources use Creative Commons licenses.

Adaptations

The following is intended to satisfy requirements for licenses that require descriptions of adaptation (e.g., CC BY-SA).

Adaptations to source data include:

  1. Use of subsets of data from the source–i.e., selected codes
  2. Modifications of code IDs for the purpose of standardization. See CodeID Standardization below.
  3. Use of content from multiple sources to generate new “mappings” sources–e.g., between HGNC and HPO.
  4. Transformation of downloaded content into files in UBKG Edge/Node format–e.g., as done with UniProtKB and GENCODE data.

Citation or Licensing Questions

If the citation or licensing information for a source needs to be updated, please contact the UBKG steward.

Jonathan Silverstein, MD

Jonathan Silverstein, MD
Department of Biomedical Informatics
University of Pittsburgh

Contexts

UMLS source context (UMLS-Graph)

The UMLS context includes sets of assertions obtained from over 100 vocabularies organized by the UMLS, including

All of these assertions are considered as being from ontologies.

CodeID standardization

In the UBKG, the colon is a reserved character, used to delimit SAB from code in a CodeID property of a Code node.

A small number of vocabularies from the UMLS have default code formats that either include the name of the vocabulary in the code or contain a colon. These vocabularies and their formats include:

SAB default format
HGNC HGNC HGNC:x
GO GO GO:x
HPO HPO HP:x
HCPCS (Level codes) HCPCS Level n: Exxx-E-xxxx

Executing the generation script with the SAB argument of UMLS reformats codes from the UMLS to the format SAB:CODE. HPO codes are standardized to HP as the SAB.

non-UMLS Source Platforms

Sources of data for the UBKG (other than the UMLS base) include:

  1. OBO Foundry
  2. NCBO BioPortal
  3. Sites maintained by stewards, including GitHub repositories, FTP sites, etc.
  4. Globus (for sources specific to Data Distillery)
  5. Spreadsheets in the custom SimpleKnowledge Editor format, stored in Google Drive
  6. Custom ETL scripts

UBKG base context (UMLS plus)

The UBKG base context appends to the UMLS context assertions from a number of other standard biomedical ontologies and data sources. The UBKG base context includes assertions from the following SABs:

SAB Description Type of data platform Citations Citation Reference License Licensing Reference
UMLS Unified Medical Language System content, with standardized CodeID ontlogy DBMI Neptune PMID:14681409 Reference The UBKG steward has a distributor license. A UBKG consumer must have a user UMLS license. Reference
UBERON Uber Anatomy Ontology ontology OBO Foundry PMID:22293552,PMID:25009735 Reference CC BY 3.0 Reference
PATO Phenotypic Quality Ontology ontology OBO Foundry PMID:28387809, PMID:15642100 Reference CC BY 3.0 Reference
CL Cell Ontology ontology OBO Foundry PMID:27377652,PMID:21208450,PMID:15693950 Reference CC BY 4.0 Reference
DOID Human Disease Ontology ontology OBO Foundry PMID:30407550 Reference CC0 Reference
OBI Ontology for Biomedical Investigations ontology OBO Foundry PMID:27128319 Reference CC BY 4.0 Reference
OBIB Ontology for Biobanking ontology OBO Foundry PMID:27148435 Reference CC BY 4.0 Reference
EDAM EDAM ontology EDAM PMID:23479348 Reference CC BY-SA 4.0 Reference
HSAPDV Human Developmental Stages Ontology ontology OBO Foundry     CC BY 3.0 Reference
SBO Systems Biology Ontology ontology SBO     Artistic License 2.0 Reference
MI Molecular Interactions ontology OBO Foundry     CC BY 4.0 Reference
CHEBI Chemical Entities of Biological Interest Ontology ontology OBO Foundry PMID:26467479 Reference CC BY 4.0 Reference
MP Mammalian Phenotype Ontology ontology OBO Foundry PMID:26467479 Reference CC BY 4.0 Reference
ORDO Orphan Rare Disease Ontology ontology NCBO BioPortal     CC BY 4.0 Reference
UNIPROTKB Protein-gene relationships from UniProtKB reference custom (translation of UniProt download) PMID:36408920 Reference CC BY 3.0 Reference
UO Units of Measurement Ontology ontology OBO Foundry PMID:23060432 Reference CC BY 3.0 Reference
MONDO MONDO Disease Ontology ontology OBO Foundry MedRxiv Reference CC BY 4.0 Reference
EFO Experimental Factor Ontology ontology NCBO BioPortal PMID:20200009 Reference EMBL-EBI Reference
PGO Pseudogene ontology NCBO BioPortal        
GENCODE_VS GENCODE ontology support (valuesets) ontology SimpleKnowledge        
GENCODE GENCODE annotations and metadata reference custom (translation of FTP download) PMID:33270111 Reference open access Reference

RefSeq gene summaries

The base context includes summaries of genes from RefSeq.

A separate script (refseq.py) adds RefSeq summaries to the UBKG. This import assumes that GenCode has been ingested.

Azimuth-Cell Ontology mappings

The HuBMAP project is building a mapping between tissue codings from Azimuth and Cell Ontology (CL). The AZ-CL mappings will be added to UBKG contexts as needed.

SAB Description Type of data platform Citations Citation Reference License
AZ Azimuth cell annotations mapped to Cell Ontology terms ontology SimpleKnowledge      

HuBMAP/SenNet context

The HuBMAP/SenNet context appends to the UBKG base set assertions from the following SABs:

SAB Description Type of data platform Citations Citation Reference License License Reference
HRA Human Reference Atlas ontology Globus PMID:34750582   Data Use Agreement Reference
HRAVS Human Reference Atlas Value Set ontology NCBO BioPortal        
HUBMAP the application ontology supporting the infrastructure of the HuBMAP Consortium ontology SimpleKnowledge        
SENNET the application ontology supporting the infrastructure of the SenNet Consortium ontology SimpleKnowledge        
CEDAR Custom HuBMAP/SenNet metadata templates built from CEDAR templates reference Globus PMID:26112029 Reference    
HMFIELD Custom legacy HuBMAP metadata reference custom        
CEDAR-ENTITY Custom mappings between CEDAR templates and HuBMAP and SenNet provenance entities reference custom        
PCL Provisional Cell Ontology ontology OBO Foundry     CC BY 4.0 Reference

Data Distillery context

The Data Distillery context appends to the UBKG base set assertions to support participants in Data Distillery, a project of the National Institute of Health’s Common Fund Data Ecosystem (CFDE). SABs include:

Data Coordinating Center / Domain SAB Description Type of data platform Citations Citation Reference License Licensing Reference
Data Distillery Support Mappings CLINVAR NCBI ClinVar - associations between genes and diseases or phenotypes reference Globus PMID:31777943,PMID:30311387,PMID:29165669
,PMID:26582918,PMID:24234437,NBK174587
Reference Public Domain Reference
  CMAP Broad Institute’s Connectivity Map - associations between genes and chemicals or drugs reference Globus PMID:29195078 Reference Public access Reference
  HPOMP HPO-MP mapping reference Globus MP: PMID:26467479
HPO: PMID:37953324
MP
HPO
MP: CC BY 4.0
HPO: UMLS restriction level 0
MP
  HGNCHPO human genotype - phenotype mapping reference Globus HGNC: PMID:36243972
HPO: PMID:37953324
HGNC
HPO
HGNC: UMLS restriction level 0
HPO: UMLS restriction level 0
 
  HCOP (formerly HCOPHGNC) human - mouse orthologs reference Globus MGD: PMID:33231642
HGNC: PMID:36243972
MGD
HGNC
MGD: CC BY 4.0
HGNC: UMLS restriction level 0
MGD
  MPMGI (formerly HCOPMP) mouse genotype-phenotype mapping reference Globus MGD: PMID:33231642
MP: PMID:26467479
MGD
MP
MGD: CC BY 4.0
MP: CC BY 4.0
MGD
MP
  RATHCOP ENSEMBL Human to ENSEMBL Rat ortholog_ reference Globus PMID:36318249 Reference Apache 2.0 Reference
  MSIGDB Molecular Signatures Database reference Globus PMID:16199517,PMID:21546393,PMID:26771021,PMID:37704782 Reference CC BY Reference
  HSCLO Chromosome Location Ontology ontology Globus bioArXiv   CC BY NC ND 4.0 Reference
  GENCODEHSCLO GENCODE-HSCLO mapping reference Globus GENCODE:PMID:33270111
HSCLO: bioArXiv
GENCODE GENCODE: open access
HSCLO: CC BY NC ND 4.0
HSCLO
  WP WikiPathways gene-gene interactions reference Globus PMID:37941138,PMID:18651794 Reference CC0 Reference
  CLINGEN Clinical Genome selected datasets reference Globus PMID:26014595 Reference CC0 Reference
  STRING StringDB Protein-Protein Interaction Network reference Globus PMID:36370105 Reference CC BY 4.0 Reference
4DNucleome 4DN 4D Nucleome result Globus PMID:28905911 ,PMID:35501320 Reference Freely available Reference
Extracellular RNA Communication Consortium (ERCC) ERCCRBP exRNA RNA Binding Proteins result Globus PMID:26320938 Reference Pre-publication data sharing Reference
  ERCCREG Regulatory Elements result Globus PMID:26320938 Reference Pre-publication data sharing Reference
GlyGen FALDO Feature Annotation Description Ontology ontology NCBO BioPortal        
  UNIPROT Universal Protein Resource ontology UniProt FTP     CC BY 4.0 Reference
  GLYCORDF Glycomics ontology GitHub PMID:24280648 Reference    
  GLYCOCOO Glycoconjugate ontology GitHub        
  GLYCANS Glycans data result Globus PMID:31616925 Reference CC BY 4.0 Reference
  PROTEOFORM Proteoform result Globus PMID:31616925 Reference CC BY 4.0 Reference
Genotype Tissue Expression (GTEx) GTEXCOEXP Co-expression result Globus PMID:23715323 Reference Public Reference
  GTEXEQTL Expression result Globus PMID:23715323 Reference Public Reference
  GTEXEXP Expression quantitative trait loci (eQTL) result Globus PMID:23715323 Reference Public Reference
Human BioMolecular Atlas Program (HuBMAP) HMAZ HuBMAP Azimuth Cell Expression Summary result Globus PMID:31597973 Reference Data Use Agreement Reference
Illuminating the Druggable Genome (IDG) IDGP Compound-protein interactions result Globus     Open Reference
  IDGD Compound-disease interactions result Globus     Open Reference
Gabriella Miller Kids First KF   result Globus        
Library of Integrated Network-Based Cellular Signatures (LINCS) LINCS   result Globus     freely available Reference
Molecular Transducers of Physical Activity Consortium (MoTrPAC) MOTRPAC   result Globus        
Metabolomics Workbench (MW) MW Cell-metabolite mappings result Globus PMID:26467476 Reference public domain Reference
Stimulating Peripheral Activity to Relieve Conditions (SPARC) NPO Neuron Phenotype Ontology ontology NCBO BioPortal        
  NPOSCKAN NPOSCKAN ontology GitHub        
Reactome REACTOME Pathways reference Globus PMID:37941124 Reference CC0,CC BY 4.0 Reference
DisGenet DGN Disease-gene associations reference Globus PMID:31680165 Reference CC-BY-SA 4.0 Reference
Biomarker Partnership Project BIOMARKER Biomarkers reference Globus PMID:35925813,PMID:34015823,PMID:32142370 Reference CC BY 4.0 Reference