GWCloud Job Server

The GWCloud Job Server is a C++ project that manages the server side of the Job Controller Server/Client architecture.

Software Architecture

The job server has three distinct components:-

The C++ server code, which is a CMake project
The Schema/DDL python project - for maintaining database schema and migration state used by the C++ component.
The keyserver python project - called by the C++ component for starting remote clients via SSH.

Local Development

Prerequisites

Several system libraries are required for local development. Package names for Ubuntu 22.04 can be found in docker/gwcloud_job_server.Dockerfile, but at the time of writing, that list looks like:

python3 python3-venv gcovr mariadb-client libunwind-dev libdw-dev libgtest-dev libmysqlclient-dev build-essential cmake libboost-dev libgoogle-glog-dev libboost-test-dev libboost-system-dev libboost-thread-dev libboost-coroutine-dev libboost-context-dev libssl-dev libboost-filesystem-dev libboost-program-options-dev libboost-regex-dev libevent-dev libfmt-dev libdouble-conversion-dev libcurl4-openssl-dev git libjemalloc-dev libzstd-dev liblz4-dev libsnappy-dev libbz2-dev valgrind libdwarf-dev libfast-float-dev clang-tidy ninja-build libcpp-jwt-dev libhowardhinnant-date-dev nlohmann-json3-dev

Note: The project now uses C++20 modules and requires the ninja-build package for optimal build performance.

Initial Setup

If this is a freshly checked out repository, you'll need to initialize and update the Git submodules:

git submodule update --init --recursive

This will pull in the required third-party dependencies:

Simple-Web-Server
Simple-WebSocket-Server
folly
sqlpp11

This project makes heavy use of docker for testing and building the project in a controlled environment. You will need mysql running on the local host if you wish to run the tests locally, with a user configured. See Settings.ixx for the expected user details - they can be overridden using environment variables.

Building

The project uses C++20 modules and requires the Ninja build system. The main targets are adacs_job_controller (runtime binary) and Boost_Tests_run (test suite).

Standard Build Process

cd src
mkdir build
cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug ..
ninja

Build Options

Static Analysis: By default, clang-tidy static analysis is enabled. To disable it for faster builds:

cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug -DENABLE_CLANG_TIDY=OFF ..

Running Tests

You can run the generated Boost_Tests_run target which will execute the full test suite:

./Boost_Tests_run

For local development with test coverage reporting, use the test-coverage.sh script from the src/build/ directory:

cd src/build
../../scripts/test-coverage.sh

A comprehensive test suite (used by the CI) can be run by running bash scripts/test.sh from the repository root. This will report any test failures, and will also generate a code coverage report. To run valgrind on the project, another script exists bash scripts/valgrind.sh - you can expect this to take some time to run.

Finally, to build the production docker asset, run bash scripts/build.sh, then push the docker image.

Deployment Configuration

The Job Controller requires two JSON configuration files to be provided at startup:

Cluster Configuration (clusters.json) - Configures the remote clients and their SSH connection details
Access Secret Configuration (access_secrets.json) - Configures HTTP access and restricts applications to specific cluster(s)

The paths to these files are specified via environment variables:

CLUSTER_CONFIG_FILE - Path to the clusters configuration file (default: config/clusters.json)
ACCESS_SECRET_CONFIG_FILE - Path to the access secrets configuration file (default: config/access_secrets.json)

Configuration File Setup

Template files are provided in the config/ directory:

config/clusters.json.template
config/access_secrets.json.template

To set up your configuration:

Copy the template files:

cp config/clusters.json.template config/clusters.json
cp config/access_secrets.json.template config/access_secrets.json

Edit the files with your actual configuration values
The actual config files (clusters.json and access_secrets.json) are git-ignored and should never be committed

Cluster Configuration Format

The clusters.json file has the following format:

[
  {
    "name": "ozstar", 							# The cluster name - must be unique,
    "host": "ozstar.swin.edu.au",				# The SSH host name
    "username": "bilby",						# The SSH username
    "path": "/fred/oz988/gwcloud_job_client/",	# The remote path to the job controller client
    "key": "-----BEGIN RSA PRIVATE KEY-----..."	# The SSH *RSA* private key used to connect
  },
  ...
]

Access Secret Configuration Format

The access_secrets.json file has the following format:

[
  {
    "name": "bilbyui",							# The application name
    "secret": "super_secret",					# A very long and complex JWT secret key (ideally a 128 character string with symbols and numbers)
    "applications": [							# A list of other applications, if any, that this application can access (ie read job information)
      "gwlab"
    ],
    "clusters": [								# A list of clusters that this application can access (ie to submit jobs to)
      "ozstar",
      "cit"
    ]
  },
  ...
]

Adding a New Cluster/Application

The typical process to add a new application would look something like the following:

Create or gain access to the remote SSH user who will be running the job controller client. Typically this user should be a system user.
Install and configure the job controller client on that remote machine. (Refer to https://github.com/gravitationalwavedc/gwcloud_job_client)
Create a new RSA ssh key pair and add the public key to the remote SSH user (Note: OPENSSH keys won't work) (Add option -m PEM into your ssh-keygen command. For example, you can run ssh-keygen -m PEM -t rsa -b 4096 -C "[email protected]" to force ssh-keygen to export as PEM format.)
Edit config/clusters.json and add a new entry with the cluster name and SSH details
Edit config/access_secrets.json and add a new entry with the application name, JWT secret, and the cluster name from step 4 in the clusters list.
Restart the job controller to apply the new configuration:
```
docker-compose -f docker/docker-compose.yaml restart web
```

Production Deployment

For production deployment using docker-compose:

Set up the .env file:

cp .env.template .env
# Edit .env with your database passwords and configuration

Set up the configuration files as described above
Start the services:
```
bash scripts/run.sh
```

The configuration files are mounted as read-only volumes in the container at /app/config/.

Using the API

The job controller server exposes a RESTful API that uses JWT authentication. There are two main objects that can be operated on, jobs and files. The Job API is under the url path /job/apiv1/job/, and the File API is under /job/apiv1/file/. Most (but not all) API requests require a JWT Authorization header to be sent in the request, refer to https://jwt.io/introduction for more details.

Job API

GET

Fetch the status of and/or filter for job(s).

Request query parameters;

Query parameters (All optional)
  jobIds:          fetch array of jobs by ID (CSV separated)
  startTimeGt:     start time greater than this (Newer than this) (Integer epoch time)
  startTimeLt:     start time less than this (Older than this) (Integer epoch time)
  endTimeGt:       end time greater than this (Newer than this) (Integer epoch time)
  endTimeLt:       end time less than this (Older than this) (Integer epoch time)
  Job Step filtering (Must include a job step id and at least one filter parameter). Job step filtering filters by the provided job step's MOST RECENT state.
  jobSteps:        csv list of:-
    jobStepId:     the name of the job step id to filter on
    state:         the state of the job step

Job steps are combined using OR

So a job filter might look like

/job/apiv1/job/?jobIDs=50,51,52&startTimeLt=1589838778&endTimeGt=1589836778&jobSteps=jid0,500,jid1,500

Which will return any jobs with ID's 51, 51, or 52, with a start time less than 1589838778 and greater than 1589836778, which have job steps with jid0 = 500 or jid1 = 500.

The return JSON object;

[
  {
    "id": 5,					# Job ID
    "user": 32,					# Id of the user who submitted the job
    "parameters": "whatever",	# The parameter payload used to launch the job
    "cluster": "ozstar",		# The cluster the job was/will be submitted to
    "bundle": "whatever",		# The bundle hash of the bundle that has/will handle the job
    "history": [				# A list of Job History objects for this job
      {
        "jobId": 5, 			# ID of the Job this Job History object is for
        "timestamp": 34233		# The timestamp when this Job History object was created
        "what": "jid0"			# The job step this Job History is for. Can be anything - is usually defined by the bundle implementation. May be "system" or "_job_completion_" for the final job outcome.
        "state": 500			# The state for this 
      },
      ...
    ]
  },
  ...
]

POST

Create and submit a new job. If submission fails, a Bad Request response will be sent.

POST payload;

{
  "cluster": "ozstar",			# The name of the cluster to submit the job to (Must be defined in ACCESS_SECRET_CONFIG for the JWT secret making the request)
  "userId": 32,					# The ID of the user who submitted the job (This is not enforced and can be anything)
  "parameters": "whatever",		# The parameter payload sent to the bundle to submit the job.
  "bundle": "whatever",			# The SHA1 hash of the client bundle to handle the job
}

The return JSON object;

{
  "jobId": 56					# The ID of the submitted job
}

PATCH

Cancel a job. If cancellation fails, a Bad Request response will be sent.

POST payload;

{
  "jobId": 56					# The ID of the job to cancel
}

The return JSON object;

{
  "cancelled": 56					# The ID of the job that was cancelled
}

DELETE

Delete a job. If cancellation fails, a Bad Request response will be sent. A job must not be in a running state to be deleted.

POST payload;

{
  "jobId": 56						# The ID of the job to delete
}

The return JSON object;

{
  "cancelled": 56					# The ID of the job that was deleted
}

File API

GET

Download a file. This request does not require JWT authorization - instead relying on the passed file download ID. If a file download can not be initiated due to an error or invalid file download id, a Bad Request response will be sent. If a client is not online, or fails to respond during the file download process, a Service Unavailable response will be sent.

Query parameters;

fileId				# The file download ID generated from a POST verb
forceDownload		# If the response should trigger an attachment download or not

The returned response will be a streaming file download with the following headers;

Content-Type: application/octet-stream
Content-Length: remote file size
if forceDownload == True:
  Content-Disposition: attachment; filename="remote file name"
else:
  Content-Disposition: filename="remote file name"

POST

Creates new file download ID(s). If an issue occurs, a Bad Request response will be sent. Download IDs are to be used with the GET verb to actually download the file. If no paths are provided an empty array of ID's is generated.

POST payload;

A. Only providing 1 path
{
  "jobId": 56,				# The job ID to generate the file download ID for
  "path": "whatever"		# Path relative to the root of the remote job to genarate a file download ID for
}

B. Providing a list of paths
{
  "jobId": 56,				# The job ID to generate the file download ID for
  "paths": [				# A list of paths to generate download IDs for
    "path1",
    "path2",
    ...
  ]
}

The return JSON object;

A. Only providing 1 path
{
  "fileId": "some uuid"		# The generated file download ID
}

B. Providing a list of paths
{
  "fileIds": [				# A list of generated file download ID's. These are guarenteed to be in the same order as the provided path list.
    "uuid1",	
    "uuid2"
  ]
}

PATCH

Get a remote file list for a job. If fetching the file list fails, a Bad Request response will be sent.

POST payload;

{
  "jobId": 56,				# The ID of the job to fetch the file list for
  "recursive": true,		# If the result should be a recursive file list, or a file list just at the provided path
  "path": "/my/path/"		# The path relative to the root of the job that the file list should be returned for.
}

The return JSON object:

{
  "files": [				# The list of files returned
  	{
  	  "path": "/file/path", # The path to the file,
  	  "isDir": false,		# If the file is a directory or not
  	  "fileSize": 345652,	# The file size in bytes
  	  "permissions": null	# The permissions mask of the file (currently not implemented)
  	}
  ]
}

Name		Name	Last commit message	Last commit date
Latest commit History 169 Commits
config		config
docker		docker
scripts		scripts
src		src
.clang-format		.clang-format
.codeclimate.yml		.codeclimate.yml
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GWCloud Job Server

Software Architecture

Local Development

Prerequisites

Initial Setup

Building

Standard Build Process

Build Options

Running Tests

Deployment Configuration

Configuration File Setup

Cluster Configuration Format

Access Secret Configuration Format

Adding a New Cluster/Application

Production Deployment

Using the API

Job API

GET

POST

PATCH

DELETE

File API

GET

POST

PATCH

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

ADACS-Australia/adacs_job_controller

Folders and files

Latest commit

History

Repository files navigation

GWCloud Job Server

Software Architecture

Local Development

Prerequisites

Initial Setup

Building

Standard Build Process

Build Options

Running Tests

Deployment Configuration

Configuration File Setup

Cluster Configuration Format

Access Secret Configuration Format

Adding a New Cluster/Application

Production Deployment

Using the API

Job API

GET

POST

PATCH

DELETE

File API

GET

POST

PATCH

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages