Unified Biomedical Knowledge Graph (UBKG)
Source Contexts
The source context for an instance of the UBKG (or UBKG context) describes a collection of sets of assertions, each of which is identified by SAB.
A SAB can correspond to one of the following types of sets of assertions:
- Ontology data, such as that extracted from OWL files published in sources such as the NCBO BioPortal or the OBO Foundry.
- Reference data extracted from public sources such as UniProtKB.
- Result Summarized experimental result data obtained from entities such as NIH-supported laboratories.
Citations
We have sought to identify citations for the sources used in UBKG contexts. The preferred form of citation is a PubMed ID.
The sources for citation references are included. Sources are of the following types:
- OBO home page
- Web pages of source stewards
- GitHub repositories of source stewards
Licensing
We have sought to identify licensing under which sources are distributed. The majority of the sources use Creative Commons licenses.
Adaptations
The following is intended to satisfy requirements for licenses that require descriptions of adaptation (e.g., CC BY-SA).
Adaptations to source data include:
- Use of subsets of data from the source–i.e., selected codes
- Modifications of code IDs for the purpose of standardization. See CodeID Standardization below.
- Use of content from multiple sources to generate new “mappings” sources–e.g., between HGNC and HPO.
- Transformation of downloaded content into files in UBKG Edge/Node format–e.g., as done with UniProtKB and GENCODE data.
Citation or Licensing Questions
If the citation or licensing information for a source needs to be updated, please contact the UBKG steward.
Jonathan Silverstein, MD
Department of Biomedical Informatics
University of Pittsburgh
Contexts
UMLS source context (UMLS-Graph)
The UMLS context includes sets of assertions obtained from over 100 vocabularies organized by the UMLS, including
- SNOMEDCT_US
- ICD10
- LOINC
- HGNC
All of these assertions are considered as being from ontologies.
CodeID standardization
In the UBKG, the colon is a reserved character, used to delimit SAB from code in a CodeID property of a Code node.
A small number of vocabularies from the UMLS have default code formats that either include the name of the vocabulary in the code or contain a colon. These vocabularies and their formats include:
SAB | default format |
---|---|
HGNC | HGNC HGNC:x |
GO | GO GO:x |
HPO | HPO HP:x |
HCPCS (Level codes) | HCPCS Level n: Exxx-E-xxxx |
Executing the generation script with the SAB argument of UMLS reformats codes from the UMLS to the format SAB:CODE. HPO codes are standardized to HP as the SAB.
non-UMLS Source Platforms
Sources of data for the UBKG (other than the UMLS base) include:
- OBO Foundry
- NCBO BioPortal
- Sites maintained by stewards, including GitHub repositories, FTP sites, etc.
- Globus (for sources specific to Data Distillery)
- Spreadsheets in the custom SimpleKnowledge Editor format, stored in Google Drive
- Custom ETL scripts
UBKG base context (UMLS plus)
The UBKG base context appends to the UMLS context assertions from a number of other standard biomedical ontologies and data sources. The UBKG base context includes assertions from the following SABs:
SAB | Description | Type of data | platform | Citations | Citation Reference | License | Licensing Reference |
---|---|---|---|---|---|---|---|
UMLS | Unified Medical Language System content, with standardized CodeID | ontlogy | DBMI Neptune | PMID:14681409 | Reference | The UBKG steward has a distributor license. A UBKG consumer must have a user UMLS license. | Reference |
UBERON | Uber Anatomy Ontology | ontology | OBO Foundry | PMID:22293552,PMID:25009735 | Reference | CC BY 3.0 | Reference |
PATO | Phenotypic Quality Ontology | ontology | OBO Foundry | PMID:28387809, PMID:15642100 | Reference | CC BY 3.0 | Reference |
CL | Cell Ontology | ontology | OBO Foundry | PMID:27377652,PMID:21208450,PMID:15693950 | Reference | CC BY 4.0 | Reference |
DOID | Human Disease Ontology | ontology | OBO Foundry | PMID:30407550 | Reference | CC0 | Reference |
OBI | Ontology for Biomedical Investigations | ontology | OBO Foundry | PMID:27128319 | Reference | CC BY 4.0 | Reference |
OBIB | Ontology for Biobanking | ontology | OBO Foundry | PMID:27148435 | Reference | CC BY 4.0 | Reference |
EDAM | EDAM | ontology | EDAM | PMID:23479348 | Reference | CC BY-SA 4.0 | Reference |
HSAPDV | Human Developmental Stages Ontology | ontology | OBO Foundry | CC BY 3.0 | Reference | ||
SBO | Systems Biology Ontology | ontology | SBO | Artistic License 2.0 | Reference | ||
MI | Molecular Interactions | ontology | OBO Foundry | CC BY 4.0 | Reference | ||
CHEBI | Chemical Entities of Biological Interest Ontology | ontology | OBO Foundry | PMID:26467479 | Reference | CC BY 4.0 | Reference |
MP | Mammalian Phenotype Ontology | ontology | OBO Foundry | PMID:26467479 | Reference | CC BY 4.0 | Reference |
ORDO | Orphan Rare Disease Ontology | ontology | NCBO BioPortal | CC BY 4.0 | Reference | ||
UNIPROTKB | Protein-gene relationships from UniProtKB | reference | custom (translation of UniProt download) | PMID:36408920 | Reference | CC BY 3.0 | Reference |
UO | Units of Measurement Ontology | ontology | OBO Foundry | PMID:23060432 | Reference | CC BY 3.0 | Reference |
MONDO | MONDO Disease Ontology | ontology | OBO Foundry | MedRxiv | Reference | CC BY 4.0 | Reference |
EFO | Experimental Factor Ontology | ontology | NCBO BioPortal | PMID:20200009 | Reference | EMBL-EBI | Reference |
PGO | Pseudogene | ontology | NCBO BioPortal | ||||
GENCODE_VS | GENCODE ontology support (valuesets) | ontology | SimpleKnowledge | ||||
GENCODE | GENCODE annotations and metadata | reference | custom (translation of FTP download) | PMID:33270111 | Reference | open access | Reference |
RefSeq gene summaries
The base context includes summaries of genes from RefSeq.
- Citation: NCBI Handbook (Reference)
- Licensing: Public Domain
A separate script (refseq.py) adds RefSeq summaries to the UBKG. This import assumes that GenCode has been ingested.
Azimuth-Cell Ontology mappings
The HuBMAP project is building a mapping between tissue codings from Azimuth and Cell Ontology (CL). The AZ-CL mappings will be added to UBKG contexts as needed.
SAB | Description | Type of data | platform | Citations | Citation Reference | License |
---|---|---|---|---|---|---|
AZ | Azimuth cell annotations mapped to Cell Ontology terms | ontology | SimpleKnowledge |
HuBMAP/SenNet context
The HuBMAP/SenNet context appends to the UBKG base set assertions from the following SABs:
SAB | Description | Type of data | platform | Citations | Citation Reference | License | License Reference |
---|---|---|---|---|---|---|---|
HRA | Human Reference Atlas | ontology | Globus | PMID:34750582 | Data Use Agreement | Reference | |
HRAVS | Human Reference Atlas Value Set | ontology | NCBO BioPortal | ||||
HUBMAP | the application ontology supporting the infrastructure of the HuBMAP Consortium | ontology | SimpleKnowledge | ||||
SENNET | the application ontology supporting the infrastructure of the SenNet Consortium | ontology | SimpleKnowledge | ||||
CEDAR | Custom HuBMAP/SenNet metadata templates built from CEDAR templates | reference | Globus | PMID:26112029 | Reference | ||
HMFIELD | Custom legacy HuBMAP metadata | reference | custom | ||||
CEDAR-ENTITY | Custom mappings between CEDAR templates and HuBMAP and SenNet provenance entities | reference | custom | ||||
PCL | Provisional Cell Ontology | ontology | OBO Foundry | CC BY 4.0 | Reference |
Data Distillery context
The Data Distillery context appends to the UBKG base set assertions to support participants in Data Distillery, a project of the National Institute of Health’s Common Fund Data Ecosystem (CFDE). SABs include:
Data Coordinating Center / Domain | SAB | Description | Type of data | platform | Citations | Citation Reference | License | Licensing Reference |
---|---|---|---|---|---|---|---|---|
Data Distillery Support Mappings | CLINVAR | NCBI ClinVar - associations between genes and diseases or phenotypes | reference | Globus | PMID:31777943,PMID:30311387,PMID:29165669 ,PMID:26582918,PMID:24234437,NBK174587 |
Reference | Public Domain | Reference |
CMAP | Broad Institute’s Connectivity Map - associations between genes and chemicals or drugs | reference | Globus | PMID:29195078 | Reference | Public access | Reference | |
HPOMP | HPO-MP mapping | reference | Globus | MP: PMID:26467479 HPO: PMID:37953324 |
MP HPO |
MP: CC BY 4.0 HPO: UMLS restriction level 0 |
MP | |
HGNCHPO | human genotype - phenotype mapping | reference | Globus | HGNC: PMID:36243972 HPO: PMID:37953324 |
HGNC HPO |
HGNC: UMLS restriction level 0 HPO: UMLS restriction level 0 |
||
HCOP (formerly HCOPHGNC) | human - mouse orthologs | reference | Globus | MGD: PMID:33231642 HGNC: PMID:36243972 |
MGD HGNC |
MGD: CC BY 4.0 HGNC: UMLS restriction level 0 |
MGD | |
MPMGI (formerly HCOPMP) | mouse genotype-phenotype mapping | reference | Globus | MGD: PMID:33231642 MP: PMID:26467479 |
MGD MP |
MGD: CC BY 4.0 MP: CC BY 4.0 |
MGD MP |
|
RATHCOP | ENSEMBL Human to ENSEMBL Rat ortholog_ | reference | Globus | PMID:36318249 | Reference | Apache 2.0 | Reference | |
MSIGDB | Molecular Signatures Database | reference | Globus | PMID:16199517,PMID:21546393,PMID:26771021,PMID:37704782 | Reference | CC BY | Reference | |
HSCLO | Chromosome Location Ontology | ontology | Globus | bioArXiv | CC BY NC ND 4.0 | Reference | ||
GENCODEHSCLO | GENCODE-HSCLO mapping | reference | Globus | GENCODE:PMID:33270111 HSCLO: bioArXiv |
GENCODE | GENCODE: open access HSCLO: CC BY NC ND 4.0 |
HSCLO | |
WP | WikiPathways gene-gene interactions | reference | Globus | PMID:37941138,PMID:18651794 | Reference | CC0 | Reference | |
CLINGEN | Clinical Genome selected datasets | reference | Globus | PMID:26014595 | Reference | CC0 | Reference | |
STRING | StringDB Protein-Protein Interaction Network | reference | Globus | PMID:36370105 | Reference | CC BY 4.0 | Reference | |
4DNucleome | 4DN | 4D Nucleome | result | Globus | PMID:28905911 ,PMID:35501320 | Reference | Freely available | Reference |
Extracellular RNA Communication Consortium (ERCC) | ERCCRBP | exRNA RNA Binding Proteins | result | Globus | PMID:26320938 | Reference | Pre-publication data sharing | Reference |
ERCCREG | Regulatory Elements | result | Globus | PMID:26320938 | Reference | Pre-publication data sharing | Reference | |
GlyGen | FALDO | Feature Annotation Description Ontology | ontology | NCBO BioPortal | ||||
UNIPROT | Universal Protein Resource | ontology | UniProt FTP | CC BY 4.0 | Reference | |||
GLYCORDF | Glycomics | ontology | GitHub | PMID:24280648 | Reference | |||
GLYCOCOO | Glycoconjugate | ontology | GitHub | |||||
GLYCANS | Glycans data | result | Globus | PMID:31616925 | Reference | CC BY 4.0 | Reference | |
PROTEOFORM | Proteoform | result | Globus | PMID:31616925 | Reference | CC BY 4.0 | Reference | |
Genotype Tissue Expression (GTEx) | GTEXCOEXP | Co-expression | result | Globus | PMID:23715323 | Reference | Public | Reference |
GTEXEQTL | Expression | result | Globus | PMID:23715323 | Reference | Public | Reference | |
GTEXEXP | Expression quantitative trait loci (eQTL) | result | Globus | PMID:23715323 | Reference | Public | Reference | |
Human BioMolecular Atlas Program (HuBMAP) | HMAZ | HuBMAP Azimuth Cell Expression Summary | result | Globus | PMID:31597973 | Reference | Data Use Agreement | Reference |
Illuminating the Druggable Genome (IDG) | IDGP | Compound-protein interactions | result | Globus | Open | Reference | ||
IDGD | Compound-disease interactions | result | Globus | Open | Reference | |||
Gabriella Miller Kids First | KF | result | Globus | |||||
Library of Integrated Network-Based Cellular Signatures (LINCS) | LINCS | result | Globus | freely available | Reference | |||
Molecular Transducers of Physical Activity Consortium (MoTrPAC) | MOTRPAC | result | Globus | |||||
Metabolomics Workbench (MW) | MW | Cell-metabolite mappings | result | Globus | PMID:26467476 | Reference | public domain | Reference |
Stimulating Peripheral Activity to Relieve Conditions (SPARC) | NPO | Neuron Phenotype Ontology | ontology | NCBO BioPortal | ||||
NPOSCKAN | NPOSCKAN | ontology | GitHub | |||||
Reactome | REACTOME | Pathways | reference | Globus | PMID:37941124 | Reference | CC0,CC BY 4.0 | Reference |
DisGenet | DGN | Disease-gene associations | reference | Globus | PMID:31680165 | Reference | CC-BY-SA 4.0 | Reference |
Biomarker Partnership Project | BIOMARKER | Biomarkers | reference | Globus | PMID:35925813,PMID:34015823,PMID:32142370 | Reference | CC BY 4.0 | Reference |