Skip to content

Latest commit

 

History

History
430 lines (297 loc) · 10.9 KB

File metadata and controls

430 lines (297 loc) · 10.9 KB

Import required libraries

import requests, json, jwt
import FAIR

Get Authorized

If testing out the Fairscape cluster on Openstack visis HERE tp get your token and paste it below. Otherwise if running locally generate a fake token by running the command below.

token = jwt.encode({'name': 'Admin','role':'admin','sub':'admin-id','groups':['test'],'aud':'https://fairscape.org'}, 'test secret', algorithm='HS256')

Step 2: Upload Data/Script

Using Transfer Service upload data with metadata
Upload a file to minio via the upload_file function.

upload_file takes: - file_path: path to local file - metadata: describes the file to be uploaded returns: - PID representing object upload_file then calls the transfer service which uploads the file to minio and metadata to mongoDB and Stardog

dataset_meta = {
    "@context":{
        "@vocab":"http://schema.org/"
    },
    "@type":"Dataset",
    "name":"Raw Data",
    "description":"Heart Rate Measures from patient from admission to discharge.",
    "author":[
        {
            "name":"Justin Niestroy",
            "@id": "https://orcid.org/0000-0002-1103-3882",
            "affiliation":"University of Virginia"
        }
    ],
}
data_id = FAIR.upload_file('UVA_7129_HR2.csv',dataset_meta,token = token)
data_id
'ark:99999/bd3d2e2b-3bc1-49b4-8b2e-fc2c0000b715'

Upload Spark Script to run on newly uploaded data

spark_code_meta = {
    "@context":{
        "@vocab":"http://schema.org/"
    },
    "@type":"SoftwareSourceCode",
    "name":"Processing  Script",
    "description":"Sample Source Code for HCTSA test on spark",
    "author":[
        {
            "name":"Justin Niestroy",
            "@id": "https://orcid.org/0000-0002-1103-3882",
            "affiliation":"University of Virginia"
      }
    ]
}
software_id = FAIR.upload_file('run_algos.py',spark_code_meta,token = token)
software_id
  'ark:99999/34d26e0d-48e4-422b-b21f e178cdeae71e'

Step 3: Retrieve Metadata

Check MDS to make sure metadata was uploaded correctly
retrieve_metadata calls MDS to get the metadata for the newly minted id.

retrieve_metadata takes: - PID: identifier of interest returns: - json-ld metadata of given PID

FAIR.retrieve_metadata(data_id,token = token)
{'@context': {'@vocab': 'http://schema.org/'},
 '@id': 'ark:99999/bd3d2e2b-3bc1-49b4-8b2e-fc2c0000b715',
 '@type': 'Dataset',
 'author': [{'@id': 'https://orcid.org/0000-0002-1103-3882',
   'affiliation': 'University of Virginia',
   'name': 'Justin Niestroy'}],
 'description': 'Heart Rate Measures from patient from admission to discharge.',
 'distribution': [{'@id': 'ark:99999/556c9199-396b-488c-9144-793448f50dc2'}],
 'name': 'Raw Data',
 'sdPublicationDate': '2020-11-19T21:42:44.907022589Z',
 'sdPublisher': {'@id': 'c5c0c8d8-656b-4189-9967-44c45624d283',
  'name': 'Justin Admin'},
 'url': 'http://ors.uvadcos.io/ark:99999/bd3d2e2b-3bc1-49b4-8b2e-fc2c0000b715'}

Step 4: Launch Spark Job

Submit Spark Job Using Compute Service
Run a the uploaded script on the uploaded data by calling compute

FAIR.compute takes: - data pid - software pid - type of job ie spark,nipype,custom compute then launches kubernetes pods to run the given script on the provided data using specialized pods

compute returns: - job_PID

job_id = FAIR.compute(data_id,software_id,'spark',token = token)
job_id
'ark:99999/cd6b791f-b256-44aa-836a-53f988bfdf2d'

Watch Running Job
list_running_jobs takes:

list_running_jobs returns: - list of running job PIDs

FAIR.list_running_jobs(token = token)
['sparkjob-cd6b791f-b256-44aa-836a-53f988bfdf2d']

Do not move on until job is completed and running job is completed

Step 5: Checkout Job Outputs

Get Job outputs from job metadata
Upon completion of a job the compute service updates the job id metadata to include properties: - evi:supports: list of output PIDs - logs: logs from job

job_metadata = FAIR.retrieve_metadata(job_id,token = token)
output_ids = job_metadata.get('evi:supports')
output_ids
[{'@id': 'ark:99999/6adb1f78-37f5-4a8e-a080-8a8b12b168b6'},
 {'@id': 'ark:99999/7a7b5fd4-ddd0-49bd-bb00-7aaed255ba6c'}]

The computation created 2 outputs
Metadata for the two created outputs is shown below

output_metadata = [
   FAIR.retrieve_metadata(out['@id'],token = token) for out in output_ids
]
output_metadata
[{'@context': {'@vocab': 'http://schema.org/'},
  '@id': 'ark:99999/6adb1f78-37f5-4a8e-a080-8a8b12b168b6',
  '@type': 'Dataset',
  'distribution': [{'@id': 'ark:99999/b7f1a6d3-27f3-4ae3-a2f4-3610c547eb13',
    '@type': 'DataDownload',
    'contentUrl': 'minio:9000/breakfast/cd6b791f-b256-44aa-836a-53f988bfdf2d/_SUCCESS',
    'fileFormat': '_SUCCESS',
    'name': '_SUCCESS'}],
  'evi:generatedBy': {'@id': 'ark:99999/cd6b791f-b256-44aa-836a-53f988bfdf2d'},
  'name': '_SUCCESS',
  'sdPublicationDate': '2020-11-20T14:53:50.01144214Z',
  'sdPublisher': {'@id': 'c5c0c8d8-656b-4189-9967-44c45624d283',
   'name': 'Justin Admin'},
  'url': 'http://ors.uvadcos.io/ark:99999/6adb1f78-37f5-4a8e-a080-8a8b12b168b6'},
 {'@context': {'@vocab': 'http://schema.org/'},
  '@id': 'ark:99999/7a7b5fd4-ddd0-49bd-bb00-7aaed255ba6c',
  '@type': 'Dataset',
  'distribution': [{'@id': 'ark:99999/7bcab29b-b8df-44fc-adba-cd1d97c8f4f7',
    '@type': 'DataDownload',
    'contentUrl': 'minio:9000/breakfast/cd6b791f-b256-44aa-836a-53f988bfdf2d/part-00000-bd1082b3-441e-4b9e-a786-e4a2f224f913-c000.csv',
    'fileFormat': 'csv',
    'name': 'part-00000-bd1082b3-441e-4b9e-a786-e4a2f224f913-c000.csv'}],
  'evi:generatedBy': {'@id': 'ark:99999/cd6b791f-b256-44aa-836a-53f988bfdf2d'},
  'name': 'part-00000-bd1082b3-441e-4b9e-a786-e4a2f224f913-c000.csv',
  'sdPublicationDate': '2020-11-20T14:53:50.308526947Z',
  'sdPublisher': {'@id': 'c5c0c8d8-656b-4189-9967-44c45624d283',
   'name': 'Justin Admin'},
  'url': 'http://ors.uvadcos.io/ark:99999/7a7b5fd4-ddd0-49bd-bb00-7aaed255ba6c'}]

Step 6: Create Visual from Job Outputs

Upload Image Code

spark_graphic_meta = {
    "@context":{
        "@vocab":"http://schema.org/"
    },
    "@type":"SoftwareSourceCode",
    "name":"Image Script",
    "description":"Creates heatmap of patients stay",
    "author":[
        {
            "name":"Justin Niestroy",
            "@id": "https://orcid.org/0000-0002-1103-3882",
            "affiliation":"University of Virginia"
      }
    ]
}
graphic_code_id = FAIR.upload_file('make_graphic.py',spark_graphic_meta,token = token)

Run Spark Job to create Image

image_job_id = FAIR.compute(output_ids[1]['@id'],graphic_code_id,'spark',token = token)
image_job_id
'ark:99999/8e8c4cf7-7c5b-44ef-8297-7c05b63a5589'

Check on running jobs

FAIR.list_running_jobs(token = token)
[]

Get image ID from Updated Job ID

image_job_meta = FAIR.retrieve_metadata(image_job_id,token = token)
image_id = image_job_meta['evi:supports'][0]['@id']

Get the image metadata

FAIR.retrieve_metadata(image_id,token = token)
{'@context': {'@vocab': 'http://schema.org/'},
 '@id': 'ark:99999/cba40fcd-c191-4183-b036-41d63f61d911',
 '@type': 'Dataset',
 'distribution': [{'@id': 'ark:99999/dc50156b-4ed7-46ac-afff-8c7a31521a51',
   '@type': 'DataDownload',
   'contentUrl': 'minio:9000/breakfast/8e8c4cf7-7c5b-44ef-8297-7c05b63a5589/Histogram_Heatmap.png',
   'fileFormat': 'png',
   'name': 'Histogram_Heatmap.png'}],
 'evi:generatedBy': {'@id': 'ark:99999/8e8c4cf7-7c5b-44ef-8297-7c05b63a5589'},
 'name': 'Histogram_Heatmap.png',
 'sdPublicationDate': '2020-11-20T15:04:06.112556684Z',
 'sdPublisher': {'@id': 'c5c0c8d8-656b-4189-9967-44c45624d283',
  'name': 'Justin Admin'},
 'url': 'http://ors.uvadcos.io/ark:99999/cba40fcd-c191-4183-b036-41d63f61d911'}

Step 7: View Evidence Graph

Build Evidence graph of created image using the evidence graph service

Evidence Graph Service builds an json-ld evidence graph representing the provenance (datasets, software, and computations that support the PID)

evidence_graph takes: - PID evidence_graph returns: - json-ld evidence graph for given PID

FAIR.evidence_graph(image_id,token = token)
{'@context': {'@vocab': 'http://schema.org/', 'evi': 'http://w3id.org/EVI#'},
 '@id': 'ark:99999/cba40fcd-c191-4183-b036-41d63f61d911',
 '@type': 'Dataset',
 'evi:generatedBy': {'@id': 'ark:99999/8e8c4cf7-7c5b-44ef-8297-7c05b63a5589',
  '@type': 'evi:Computation',
  'evi:usedDataset': {'@id': 'ark:99999/7a7b5fd4-ddd0-49bd-bb00-7aaed255ba6c',
   '@type': 'Dataset',
   'evi:generatedBy': {'@id': 'ark:99999/cd6b791f-b256-44aa-836a-53f988bfdf2d',
    '@type': 'evi:Computation',
    'evi:usedDataset': {'@id': 'ark:99999/bd3d2e2b-3bc1-49b4-8b2e-fc2c0000b715',
     '@type': 'Dataset',
     'author': {'@id': 'https://orcid.org/0000-0002-1103-3882',
      '@type': 'Person',
      'name': 'Justin Niestroy'},
     'name': 'Raw Data'},
    'evi:usedSoftware': {'@id': 'ark:99999/34d26e0d-48e4-422b-b21f-e178cdeae71e',
     '@type': 'SoftwareSourceCode',
     'author': {'@id': 'https://orcid.org/0000-0002-1103-3882',
      '@type': 'Person',
      'name': 'Justin Niestroy'},
     'name': 'Processing  Script'},
    'name': 'Computation'},
   'name': 'part-00000-bd1082b3-441e-4b9e-a786-e4a2f224f913-c000.csv'},
  'evi:usedSoftware': {'@id': 'ark:99999/b9739256-81eb-4159-acd6-8ddd21c97d95',
   '@type': 'SoftwareSourceCode',
   'author': {'@id': 'https://orcid.org/0000-0002-1103-3882',
    '@type': 'Person',
    'name': 'Justin Niestroy'},
   'name': 'Image Script'},
  'name': 'Computation'},
 'name': 'Histogram_Heatmap.png'}

View Visualization of the Evidence Graph

'https://clarklab.uvarc.io/viz/' + image_id
'https://clarklab.uvarc.io/viz/ark:99999/cba40fcd-c191-4183-b036-41d63f61d911'

Download Created Image

Download output files from computations using download_file

download_file takes: - PID: id representing object to be downloaded - name: file name for downloaded object
download_file returns:
- bool: True if download if succesful
- file saved location given

from IPython.display import Image
FAIR.download_file(image_id,'Created Image.png',token = token)
Image(filename='Created Image.png')

png

r = requests.post('https://clarklab.uvarc.io/auth/logout',headers = {"Authorization":token })
r.content.decode()