DD KG for addition of Biomarker edges #131

jeet-vora · 2025-03-05T16:10:01Z

Recommended development process:

Revise the edge and node files for BIOMARKER.
Optional: Upload the edge and node files to the Globus folder.
Take a copy of the latest set of ontology CSVs of the Data Distillery minus the Biomarker data (DD-no-BIOMARKER) and add it to your ETL environment.
Add your new edge and node files to the folder that corresponds to the download folder of your Globus Connect Personal setup. Your copy of edges_nodes.ini should point to this folder. For example, I download everything from Globus to a subfolder of my Documents folder on my MacOs machine. My ini file looks like:
[Paths]

Local paths containing ingestion files

...
BIOMARKER=/Users/jas971/documents/globus/Import/BIOMARKER

Run the ingestion script to generate a new set of ontology CSVs with the new BIOMARKER (./build_csv.sh -v BIOMARKER), integrating your version of BIOMARKER with the DD-no-BIOMARKER.
Using the ontology CSVs generated in step 5, execute the workflow described in ubkg-neo4j to build a Docker container. As you've probably experienced, the longest waits are in the import of the CSVs and the time spent to create the relationship indexes. (Pro [or maybe jaded amateur] tip: if you find the import taking forever, especially for relationships, you're probably running into memory issues. Reboot and do over.)

The Zip of the CSVs for the Jan 3 Data Distillery except for BIOMARKER is available at https://ubkg-downloads.xconsortia.org/.

The file name is DD_no_BIOMARKER03Jan2025.zip.

jeet-vora assigned seankim658 Mar 5, 2025