Skip to content

Latest commit

 

History

History
106 lines (67 loc) · 3.11 KB

arch.adoc

File metadata and controls

106 lines (67 loc) · 3.11 KB

1. Introduction and Goals

Following describes the architecture of eFlows4HPC Data Catalog. The service will provide information about data sets used in the project. The catalog will store info about locations, schemas, and additional metadata.

Main features:

  • keep track of data sources

  • enable registration of new data sources

  • provide user-view as well as simple API to access the information

1.1. Requirements Overview

ID Requirement Explanation

R1

View data sources

View the list of data sources and details on particular ones (web page + api)

R2

Register data sets

Authenticated users should be able register/change data sets with additional metadata

R3

No schema MD

We don’t impose a schema for the metadata (not know what is relevant)

R4

Documented API

Swagger/OpenAPI

1.2. Quality Goals

ID Prio Quality Explanation

Q1

1

Extensibility

Possibility to add new metadata to existing rows

Q2

2

Interoperability

The service should work with Data Logistics

Q3

2

Deployability

Quick/automatic deployment

2. Architecture Constraints

Constraint Explanation

Authentication

OAuth-based for admin users

Deployment

We shall use CI/CD, the project will also be a playing field to setup this and test before the Data Logistics

Docker-based Deployment

This technology will be used in the project anyways

3. System Scope and Context

3.1. Business Context

Business view

3.2. Technical Context

Mapping Input/Output to Channels

User → Data Catalog: simple (static?) web page view

Data Logistics → Data catalog HTTP/API read-only

Admin → Data Catalog: either a web page or CLI

4. Solution Strategy

4.1. Speed and flexibility

This product will not be very mission critical, we want to keep it simple. A solution even without a backend database would be possible. API with Swagger/OpenAPI (e.g. fastAPI). Frontend static page with JavaScript calls to the API.

4.2. Automatic Deployment

  1. Code in Gitlab

  2. Resources on HDF Cloud

  3. Automatic deployment with Docker + docker-compose, OpenStack API

We use docker image repository in gitlab to generate new images.

4.3. Structure

Main data model is based on JSON and uses pydantic. Resources in the Catalog are of two storage classes (sources and targets). The number of classes can change in the future.

The actual storage of the information in the catalog is done through an abstract interface which in the first attempt stores the data in a file, other backends can be added.

API uses a backend abstraction to mange the informations

Web front-end are static html files generated from templates. This gives a lot of flexibility and allows for easy scalability if required.