Unified Biomedical Knowledge Graph (UBKG)

Deployment

Scope

These instructions are intended for developers who wish to generate a UBKG instance from source.

To install a turn-key, fully-indexed, containerized version of a UBKG instance, refer to the instructions in the Docker tab.

An UBKG infrastructure involves…

the application of a methodology…
that uses a set of tools…
and sets of assertion data…
to be deployed to an environment…
and abstracted by means of an API.

Tools

The tools used to build and deploy a UBKG instance are described on the home page, and include:

The source framework
The generation framework
The neo4j Docker framework

Assertion data

The sets of assertion data that are required to populate a deployment of a UBKG depend on the UBKG context.

There are three types of assertion data:

Ontology assertions taken directly from published ontology files in an OWL serialization, such as those described in the NCBO BioPortal.
Reference assertions built from sources including public databases, such as UniProtKB or GenCode.
Experimental assertions based on the results of scientific experiments

API

An optional API server abstracts Cypher queries that obtain information from the UBKG neo4j instance.

Environment

The Docker container that contains the neo4j instance of UBKG can reside in different types of environments. For example, a developer can instantiate an instance of a UBKG docker on a local machine. The Docker can also be hosted by a Virtual Machine.

Methodology

Obtain UMLS CSVs

UBKG ontology CSV files contain licensed content extracted from the Unified Medical Language System (UMLS), using the source_framework scripts in the ubkg-etl repository.

The ontology CSV files cannot be published to public repositories, such as GitHub or Docker Hub. Prebuilt ontology CSVs are available for download, but require authorization because of licensing issues.

For assistance with obtaining prebuilt ontology CSVs, contact the UBKG steward:

Jonathan Silverstein, MD

Jonathan Silverstein, MD
Department of Biomedical Informatics
University of Pittsburgh

Identify and collect data sources for UBKG Context

A UBKG context corresponds to the assertion data that is appended to the UMLS CSVs. A context is a collection of sets of assertion files for a group of SABs. Because the assertions associated with a particular SAB contain references to entities from other SABs, a context will have dependencies. The sets of assertion files usually need to be ingested in a particular order for a context.

The contexts page lists the sets of SABs for the known UBKG contexts.

Generate the ontology CSVs

Use the scripts in the generation framework of the ubkg-etl repository to build a set of ontology CSVs by appending SAB data to the UMLS CSVs.

Instantiate Docker

Use the scripts in the ubkg-neo4j repository to instantiate a Docker container for a neo4j instance of UBKG that is populated by the ontology CSVs.

Instantiate API server

Use the scripts in the ubkg-api repository to instantiate an instance of a UBKG API. Contact the UBKG steward (Jonathan Silverstein) for assistance.