Unified Biomedical Knowledge Graph (UBKG)
Deploying a Turn-key Docker neo4j Distribution
Scope
Instances of UBKG with different source contexts (content) are available as Docker distributions.
These distributions are designed for turn-key deployment of a complete, fully-indexed, standalone UBKG instance running in neo4j, hosted in a Docker container, running on a local or networked machine.
Installation of a distribution is minimal–in particular, it does not require separate installation of neo4j locally. The local machine only needs to have Docker installed.
Downloading from the UBKG Download site
Distributions of the UBKG are available at the UBKG download site.
UMLS API requirements
A UBKG instance includes content from the Unified Medical Language System (UMLS), a repository of biomedical vocabularies maintained by the National Library of Medicine. The use of content from the UMLS is governed by the UMLS License Agreement.
Use of the UMLS content in the UBKG requires two licenses:
- The University of Pittsburgh distributes content originating from the UMLS by means of a distributor license.
- Consumers of the UBKG have access to UMLS content through the license that is part of their UMLS Technology Services (UTS) accounts.
The UBKG download site combines the API key from the University of Pittsburgh with the API key from a user to authenticate a user’s request to download the UBKG. The user must provide the API for their UTS account to the UBKG Download site.
To obtain a UTS API key:
- Create a UTS user profile.
- Generate an API key.
Downloading distribution Zip files
After providing a API key, users can download distributions of the UBKG. Distributions are available as Zip archive files.
Host machine configuration
- Install Docker on the host machine.
- The host machine will require considerable disk space to accommodate the unzipped distribution, depending on the distribution. As of December 2023, distribution sizes were around:
- base context: 9GB
- HuBMAP/SenNet: 9GB
- Data Distillery: 20GB
Simple Deployment
This deployment uses default settings. These instructions assume that the operating system for the host machine is either Linux or MacOs.
- Expand the Zip archive. The expanded distribution directory will contain:
- a directory named Data. This directory contains the UBKG neo4j database files.
- a script named build_container.sh. This script builds a Docker container hosting the UBKG instance of neo4j.
- container.cfg.example. This is an annotated example of the configuration file required by build_container.sh.
- The distribution scripts require a file named container.cfg. Create your own version of a container.cfg file by copy container.cfg.example to a file named container.cfg.
- The container.cfg.example file assigns default configuration values to the distribution, including the password for the neo4j user account.
- It is recommended that you change the password for the neo4j user account in container.cfg.
- Passwords must have at least 8 letters and have at least one alphabetic and one numeric character.
- Start Docker Desktop.
- Open a Terminal session.
- Move to the distribution directory (where the Zip was expanded).
- Execute
./build_container.sh
. - The build_container.sh will run for a short time (1-2 minutes), and will be finished when it displays a message similar to
[main] INFO org.eclipse.jetty.server.Server - Started Server@16fa5e34{STARTING}[10.0.15,sto=0] @11686ms
The build_container.sh will create a Docker container with properties that it obtains from container.cfg. Following are the default properties:
Property | Value |
---|---|
container name | ubkg-neo4j-<version> |
image name | hubmap/ubkg-neo4j |
image tag | current-release |
ports | 4000:7474 4500:7687 |
read-write | read-only |
- Open a browser window. Enter
http://localhost:<port>/browser/
, whereis the neo4j port specified in the configuration. The default value from **container.cfg** is **4000**. - The neo4j browser window will appear. Enter connection information:
Property | Value |
---|---|
Connect URL | bolt://localhost: |
Database name | blank (the default). Note that this field may not appear on the page. |
Authentication Type | Username/Password |
Username | neo4j |
Password | password from container.cfg |
- Select Connect.
- You should be able to start running Cypher commands in the neo4j browser.
Custom Deployment
Changes to Docker configuration
To modify the Docker configuration, change values in the container.cfg file. Keeping the value commented results in the script using a default value.
Value | Purpose | Recommendation |
---|---|---|
container_name | Name of the Docker container | accept default |
docker_tag | Tag for the Docker container | accept default |
neo4j_password | Password for the neo4j user | minimum of 8 characters, including at least one letter and one number |
ui_port | Port used by the neo4j browser | number other than 7474 to prevent possible conflicts with local installations of neo4j |
bolt_port | Port used by neo4j bolt (Cypher) | number other than 7687 to prevent possible conflicts with local installations of neo4j |
read_mode | Whether the neo4j database is read-only or read-write | accept default (read-only) |
db_mount_dir | Path to the external neo4j database | accept default (/data) |
all others | Not used for deployment; values will be ignored | accept default |
Rename configuration file
To specify another configuration file, execute the command /.build_container.sh external -c <your configuration file>