Unified Biomedical Knowledge Graph
This is a working glossary, and not a formal or exhaustive terminology. When the discussion of a term in the glossary refers to another term in the glossary, the other term will be in bold italic.
An assertion establishes a relationship between two entities.
A repository of biomedical ontologies maintained by NCBO.
Although many of the ontologies published in BioPortal follow OBO principles, compliance is optional.
The identifier for a concept in a vocabulary or ontology in the UMLS. A code is unique to the ontology. For example, both the SNOMED_CT and the NCI Thesaurus vocabularies have different codes to represent the concept of “kidney”; however, because these codes are cross-referenced to the same UMLS concept, the codes share a CUI.
An entity in the UBKG. A concept is represented with a code in a source ontology and a CUI in the overall UBKG.
Concept Unique Identifier (CUI)
A unique identifier for a concept in the UBKG. A CUI can be cross-referenced by codes from many ontologies, allowing for associations between entities in different ontologies.
For example, the CUI for the concept of methanol in UMLS is cross-referenced by codes in a number of ontologies and vocabularies, such as SNOMED_CT and NCI. The use of the CUI allows for questions such as “How many ways do all the ontologies in the UBKG refer to methanol?” As the following illustration shows, a knowledge graph can reveal that terms from different ontologies include “methanol”, “Methyl Alcohol”, and “METHYL ALCOHOL”.
How do all the ontologies in the UBKG refer to “methanol”?
A link between a concept in one ontology and a concept in another. Cross-references can be described in one of two ways:
- Between a code and a code in another ontology–e.g., HUBMAP C000007 (Imaging Assay) cross-references OBI 0000185 (Imaging Assay)
- Between a code and a UMLS CUI–e.g., HUBMAP C000007 (Assay) cross-references UMLS C1510438 (Assay)
Because UMLS CUIs can be linked to codes in many ontologies, a cross-reference to a UMLS CUIs is likely to be more useful than a cross-reference to a single code in another ontology.
A synonym for a relationship, used primarily for knowledge graphs.
To represent a concept in an ontology with a code**. When possible, concepts should be encoded using codes from a standard, published source such as a vocabulary, a public database, or an **OWL file.
For example, in the Protein Ontology, the entity 5’-AMP-activated protein kinase subunit gamma-1 is encoded with code 000013225.
An entity represents a member of an ontology. Entities associate with other entities in an ontology via relationships.
An example of an entity is 5’-AMP-activated protein kinase subunit gamma-1, which is a protein in the Protein Ontology (PR), encoded with code 000013225.
In a knowledge graph, an entity is represented by a node.
A cross-reference between a concept in one ontology and a concept in another ontology. The idea of “equivalence class” is used in OWL.
A set of files that describe the entities and relationships of an ontology that is to be integrated into the UBKG.
A relationship in an ontology has a direction: it starts with one node and “goes toward” another–e.g.,
(5’-AMP-activated protein kinase subunit gamma-1)→isa→(protein)
A relationship is considered the inverse of another relationship if it can be used to link the same nodes of the relationship in the opposite direction. For example, the inverse_isa relationship for a concept can be used to identify those concepts that have an isa relationship with the concept.
(protein)←inverse_isa←(5’-AMP-activated protein kinase subunit gamma-1)
Inverse relationships can be ambiguously named: for example, the inverse relationship of has_gene_product is not inverse_has_gene_product, but gene_product_of. To obtain inverse relationships that are ambiguously named, the UBKG refers to the RO.
In an OWL file, an entity or a relationship can be described with an International Resource Indicator (IRI). An IRI is a PURL (Permanent Uniform Resource Locator) that refers to a published online resource.
UBKG recognizes entity IRIs in format OBO PURL/ontology identifier_code in ontology. For example, the IRI for a protein entity in PR is http://purl.obolibrary.org/obo/PR_Q68D20.
UBKG recognizes relationship IRIs formatted in the same manner as for entities, provided that the OBO PURL is from the Relationship Ontology (RO)–e.g., http://purl.obolibrary.org/obo/RO_0002160.
Knowledge Graph (database)
A knowledge graph is a representation of information characterized by the relationships between a set of entities. Whereas relational databases organize information into tables, rows, and fields, knowledge graph databases organize information into nodes (representing entities) and edges (representing relationships).
The National Center for Biomedical Ontology (NCBO) maintains BioPortal, a repository of biomedical ontologies.
The UBKG database is deployed in instances of the neo4j graph data platform. “A neo4j” or “the neo4j” is equivalent to “an instance of the UBKG database hosted in a neo4j server”.
A synonym for an entity, used primarily in knowledge graphs.
For the purposes of the UBKG, an ontology is the representation of a set of entities and the relationships between them.
The Ontology Lookup Service OLS is a repository of biomedical ontologies. The OLS allows for searches of relationship properties (relationships), including those from the RO.
The Open Biological and Biomedical Ontology Foundry OBO is a community that maintains a set of “interoperable ontologies for the biomedical sciences”, as well as a set of principles–i.e., best practices and standards for representing ontologies in OWL.
Many of the ontologies published in BioPortal follow OBO principles.
The Web Ontology Language OWL is a language for representing ontologies in a standard format that can be interpreted by software applications. Biomedical ontologies can be published as OWL files in reference sites such as the NCBO BioPortal and the OBO Foundry.
A relationship is an association between two entities in an ontology.
Types of relationships
Most of the relationships in biomedical ontologies can be characterized with one of two types:
- hierarchical (e.g., 5’-AMP-activated protein kinase subunit gamma-1 isa protein)
- non-hierarchical (e.g., protein PMS2CL (human) gene_product_of PMS2CL gene)
In a knowledge graph, a relationship corresponds to an edge between two nodes.
Relationship Ontology (RO)
The Relationship Ontology (RO) is an ontology of the relationships that are used in other ontologies–i.e., how relationships themselves relate to one another.
A relationship between relationships that is important in the UBKG is the inverse relationship.
Relationships in RO can be reviewed in a number of ways, including:
- By searching the OLS
- this JSON file
The UBKG adopts the UMLS practice of identifying source ontologies with a Source Abbreviation (SAB). Examples of UMLS SABs include SNOMED_CT and UBERON. UBKG uses published acronyms for ontologies when possible–e.g., PATO.
Term (preferred, synonym)
A usually short text identifier for a code in an **ontology**. For example, a term for code 64033007 in SNOMED_CT is “kidney”.
A term can be a preferred term or a synonym.
A triple asserts a relationship** between two **entities (nodes) in an ontology.
A triple is in the format subject predicate object
subject and object are both entities (nodes). predicate is a relationship (edge).
For the purposes of the UBKG, a vocabulary is similar to an ontology. Some vocabularies (e.g., SNOMED_CT) are also ontologies.
The Unified Medical Language System UMLS consolidates a number of different biomedical ontologies, vocabularies, and standards.
The UBKG represents UMLS and other concept data with a set of nodes and edges, including nodes for
- the code for the concept in its originating ontology
- the CUI for the code in the UBKG
- terms for the concept, both for preferred terms and synonyms
- definitions of the concept