The GWCloud Job Server is a C++ project that manages the server side of the Job Controller Server/Client architecture.
The job server has three distinct components:-
- The C++ server code, which is a CMake project
- The Schema/DDL python project - for maintaining database schema and migration state used by the C++ component.
- The keyserver python project - called by the C++ component for starting remote clients via SSH.
Several system libraries are required for local development. Package names for Ubuntu 22.04 can be found in docker/gwcloud_job_server.Dockerfile, but at the time of writing, that list looks like:
python3 python3-venv gcovr mariadb-client libunwind-dev libdw-dev libgtest-dev libmysqlclient-dev build-essential cmake libboost-dev libgoogle-glog-dev libboost-test-dev libboost-system-dev libboost-thread-dev libboost-coroutine-dev libboost-context-dev libssl-dev libboost-filesystem-dev libboost-program-options-dev libboost-regex-dev libevent-dev libfmt-dev libdouble-conversion-dev libcurl4-openssl-dev git libjemalloc-dev libzstd-dev liblz4-dev libsnappy-dev libbz2-dev valgrind libdwarf-dev libfast-float-dev clang-tidy ninja-build libcpp-jwt-dev libhowardhinnant-date-dev nlohmann-json3-dev
Note: The project now uses C++20 modules and requires the ninja-build package for optimal build performance.
If this is a freshly checked out repository, you'll need to initialize and update the Git submodules:
git submodule update --init --recursiveThis will pull in the required third-party dependencies:
- Simple-Web-Server
- Simple-WebSocket-Server
- folly
- sqlpp11
This project makes heavy use of docker for testing and building the project in a controlled environment. You will need mysql running on the local host if you wish to run the tests locally, with a user configured. See Settings.ixx for the expected user details - they can be overridden using environment variables.
The project uses C++20 modules and requires the Ninja build system. The main targets are adacs_job_controller (runtime binary) and Boost_Tests_run (test suite).
cd src
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ..
ninjaStatic Analysis: By default, clang-tidy static analysis is enabled. To disable it for faster builds:
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DENABLE_CLANG_TIDY=OFF ..You can run the generated Boost_Tests_run target which will execute the full test suite:
./Boost_Tests_runFor local development with test coverage reporting, use the test-coverage.sh script from the src/build/ directory:
cd src/build
../../scripts/test-coverage.shA comprehensive test suite (used by the CI) can be run by running bash scripts/test.sh from the repository root. This will report any test failures, and will also generate a code coverage report. To run valgrind on the project, another script exists bash scripts/valgrind.sh - you can expect this to take some time to run.
Finally, to build the production docker asset, run bash scripts/build.sh, then push the docker image.
The Job Controller requires two JSON configuration files to be provided at startup:
- Cluster Configuration (
clusters.json) - Configures the remote clients and their SSH connection details - Access Secret Configuration (
access_secrets.json) - Configures HTTP access and restricts applications to specific cluster(s)
The paths to these files are specified via environment variables:
CLUSTER_CONFIG_FILE- Path to the clusters configuration file (default:config/clusters.json)ACCESS_SECRET_CONFIG_FILE- Path to the access secrets configuration file (default:config/access_secrets.json)
Template files are provided in the config/ directory:
config/clusters.json.templateconfig/access_secrets.json.template
To set up your configuration:
-
Copy the template files:
cp config/clusters.json.template config/clusters.json cp config/access_secrets.json.template config/access_secrets.json
-
Edit the files with your actual configuration values
-
The actual config files (
clusters.jsonandaccess_secrets.json) are git-ignored and should never be committed
The clusters.json file has the following format:
[
{
"name": "ozstar", # The cluster name - must be unique,
"host": "ozstar.swin.edu.au", # The SSH host name
"username": "bilby", # The SSH username
"path": "/fred/oz988/gwcloud_job_client/", # The remote path to the job controller client
"key": "-----BEGIN RSA PRIVATE KEY-----..." # The SSH *RSA* private key used to connect
},
...
]
The access_secrets.json file has the following format:
[
{
"name": "bilbyui", # The application name
"secret": "super_secret", # A very long and complex JWT secret key (ideally a 128 character string with symbols and numbers)
"applications": [ # A list of other applications, if any, that this application can access (ie read job information)
"gwlab"
],
"clusters": [ # A list of clusters that this application can access (ie to submit jobs to)
"ozstar",
"cit"
]
},
...
]
The typical process to add a new application would look something like the following:
- Create or gain access to the remote SSH user who will be running the job controller client. Typically this user should be a system user.
- Install and configure the job controller client on that remote machine. (Refer to https://github.com/gravitationalwavedc/gwcloud_job_client)
- Create a new RSA ssh key pair and add the public key to the remote SSH user (Note: OPENSSH keys won't work) (Add option
-m PEMinto your ssh-keygen command. For example, you can runssh-keygen -m PEM -t rsa -b 4096 -C "[email protected]"to force ssh-keygen to export as PEM format.) - Edit
config/clusters.jsonand add a new entry with the cluster name and SSH details - Edit
config/access_secrets.jsonand add a new entry with the application name, JWT secret, and the cluster name from step 4 in theclusterslist. - Restart the job controller to apply the new configuration:
docker-compose -f docker/docker-compose.yaml restart web
For production deployment using docker-compose:
-
Set up the
.envfile:cp .env.template .env # Edit .env with your database passwords and configuration -
Set up the configuration files as described above
-
Start the services:
bash scripts/run.sh
The configuration files are mounted as read-only volumes in the container at /app/config/.
The job controller server exposes a RESTful API that uses JWT authentication. There are two main objects that can be operated on, jobs and files. The Job API is under the url path /job/apiv1/job/, and the File API is under /job/apiv1/file/. Most (but not all) API requests require a JWT Authorization header to be sent in the request, refer to https://jwt.io/introduction for more details.
Fetch the status of and/or filter for job(s).
Request query parameters;
Query parameters (All optional)
jobIds: fetch array of jobs by ID (CSV separated)
startTimeGt: start time greater than this (Newer than this) (Integer epoch time)
startTimeLt: start time less than this (Older than this) (Integer epoch time)
endTimeGt: end time greater than this (Newer than this) (Integer epoch time)
endTimeLt: end time less than this (Older than this) (Integer epoch time)
Job Step filtering (Must include a job step id and at least one filter parameter). Job step filtering filters by the provided job step's MOST RECENT state.
jobSteps: csv list of:-
jobStepId: the name of the job step id to filter on
state: the state of the job step
Job steps are combined using OR
So a job filter might look like
/job/apiv1/job/?jobIDs=50,51,52&startTimeLt=1589838778&endTimeGt=1589836778&jobSteps=jid0,500,jid1,500
Which will return any jobs with ID's 51, 51, or 52, with a start time less than 1589838778 and greater than 1589836778, which have job steps with jid0 = 500 or jid1 = 500.
The return JSON object;
[
{
"id": 5, # Job ID
"user": 32, # Id of the user who submitted the job
"parameters": "whatever", # The parameter payload used to launch the job
"cluster": "ozstar", # The cluster the job was/will be submitted to
"bundle": "whatever", # The bundle hash of the bundle that has/will handle the job
"history": [ # A list of Job History objects for this job
{
"jobId": 5, # ID of the Job this Job History object is for
"timestamp": 34233 # The timestamp when this Job History object was created
"what": "jid0" # The job step this Job History is for. Can be anything - is usually defined by the bundle implementation. May be "system" or "_job_completion_" for the final job outcome.
"state": 500 # The state for this
},
...
]
},
...
]
Create and submit a new job. If submission fails, a Bad Request response will be sent.
POST payload;
{
"cluster": "ozstar", # The name of the cluster to submit the job to (Must be defined in ACCESS_SECRET_CONFIG for the JWT secret making the request)
"userId": 32, # The ID of the user who submitted the job (This is not enforced and can be anything)
"parameters": "whatever", # The parameter payload sent to the bundle to submit the job.
"bundle": "whatever", # The SHA1 hash of the client bundle to handle the job
}
The return JSON object;
{
"jobId": 56 # The ID of the submitted job
}
Cancel a job. If cancellation fails, a Bad Request response will be sent.
POST payload;
{
"jobId": 56 # The ID of the job to cancel
}
The return JSON object;
{
"cancelled": 56 # The ID of the job that was cancelled
}
Delete a job. If cancellation fails, a Bad Request response will be sent. A job must not be in a running state to be deleted.
POST payload;
{
"jobId": 56 # The ID of the job to delete
}
The return JSON object;
{
"cancelled": 56 # The ID of the job that was deleted
}
Download a file. This request does not require JWT authorization - instead relying on the passed file download ID. If a file download can not be initiated due to an error or invalid file download id, a Bad Request response will be sent. If a client is not online, or fails to respond during the file download process, a Service Unavailable response will be sent.
Query parameters;
fileId # The file download ID generated from a POST verb
forceDownload # If the response should trigger an attachment download or not
The returned response will be a streaming file download with the following headers;
Content-Type: application/octet-stream
Content-Length: remote file size
if forceDownload == True:
Content-Disposition: attachment; filename="remote file name"
else:
Content-Disposition: filename="remote file name"
Creates new file download ID(s). If an issue occurs, a Bad Request response will be sent. Download IDs are to be used with the GET verb to actually download the file. If no paths are provided an empty array of ID's is generated.
POST payload;
A. Only providing 1 path
{
"jobId": 56, # The job ID to generate the file download ID for
"path": "whatever" # Path relative to the root of the remote job to genarate a file download ID for
}
B. Providing a list of paths
{
"jobId": 56, # The job ID to generate the file download ID for
"paths": [ # A list of paths to generate download IDs for
"path1",
"path2",
...
]
}
The return JSON object;
A. Only providing 1 path
{
"fileId": "some uuid" # The generated file download ID
}
B. Providing a list of paths
{
"fileIds": [ # A list of generated file download ID's. These are guarenteed to be in the same order as the provided path list.
"uuid1",
"uuid2"
]
}
Get a remote file list for a job. If fetching the file list fails, a Bad Request response will be sent.
POST payload;
{
"jobId": 56, # The ID of the job to fetch the file list for
"recursive": true, # If the result should be a recursive file list, or a file list just at the provided path
"path": "/my/path/" # The path relative to the root of the job that the file list should be returned for.
}
The return JSON object:
{
"files": [ # The list of files returned
{
"path": "/file/path", # The path to the file,
"isDir": false, # If the file is a directory or not
"fileSize": 345652, # The file size in bytes
"permissions": null # The permissions mask of the file (currently not implemented)
}
]
}