Skip to content

Commit ed80e76

Browse files
committed
library.storage: implement Directory, S3, Segment, CRC, CRCMutual
and update README & example workers to match Issue: AAP-54345
1 parent c79c337 commit ed80e76

19 files changed

+702
-110
lines changed

metrics_utility/library/README.md

Lines changed: 94 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ This is a Python library for metrics-utility, exposing all the functionality in
44

55
It provides an abstraction over collectors, packaging and storage, extraction, rollups, dataframes and reports, as well as helper functions for tempdirs, locking, and datetime handling.
66

7+
78
### Abstractions
89

910
#### Collector
@@ -23,6 +24,7 @@ should raise an exception when passed invalid values or a bad DB connection, but
2324

2425
Files created by collectors are only cleaned up when called by Package, otherwise rely on having been created inside a per-job tempdir (see helpers), which then gets cleaned up.
2526

27+
2628
#### Package
2729

2830
When multiple collectors are called, or the same collector is called multiple times, they are independent of each other.
@@ -37,28 +39,112 @@ For grouping things together, we have a Package class, which takes a list of ini
3739

3840
Such tarball can then be passed to a Storage class, and gets cleaned up afterwards.
3941

42+
4043
#### Storage
4144

4245
Storage objects serve to provide a shared interface for various storage modes. Each can be initialized with an appropriate configuration, and can retrieve or save objects from/to long-term storage.
4346

4447
Mainly S3 and local directories are supported,
4548
but the Storage mechanism can also be used to push the data to cloud APIs or to save it in a local DB.
4649

47-
* `StorageDirectory(base_path='./')`
48-
* `StorageS3(...bucket, auth, server...)`
49-
* `StorageCRC(...server, auth...)`
50+
Common API:
51+
52+
* `storage.put(name, ...)` - should upload to storage, and retry/raise on failure.
53+
* `storage.put(name, dict=data)` - uploads a dict, likely as json data, or a .json file
54+
* `storage.put(name, filename=path)` - uploads a local file (by name)
55+
* `storage.put(name, fileobj=handle)` - uploads an opened local file or a compatible object (by a file-like handle)
56+
* `storage.get(name)` - (context manager) should download from storage into a temporary file, yield the temporary filename, and remove the file again.
57+
58+
Also supported - `exists(name) -> Bool`, `remove(name)`, `glob(pattern) -> [filenames]`.
59+
60+
Implemented storage classes:
61+
62+
```
63+
# StorageDirectory - local directory structure under base_path
64+
#
65+
# base_path = METRICS_UTILITY_SHIP_PATH
66+
67+
StorageDirectory(
68+
base_path='./',
69+
)
70+
```
71+
72+
```
73+
# StorageS3 - S3 or minio
74+
#
75+
# bucket = METRICS_UTILITY_BUCKET_NAME
76+
# endpoint = METRICS_UTILITY_BUCKET_ENDPOINT
77+
# region = METRICS_UTILITY_BUCKET_REGION
78+
# access_key = METRICS_UTILITY_BUCKET_ACCESS_KEY
79+
# secret_key = METRICS_UTILITY_BUCKET_SECRET_KEY
80+
81+
StorageS3(
82+
bucket='name',
83+
endpoint='http://localhost:9000', # or 'https://s3.us-east.example.com'
84+
region='us-east-1', # optional
85+
access_key='...',
86+
secret_ley='...',
87+
)
88+
```
89+
90+
```
91+
# StorageSegment - segment analytics (put-only)
92+
#
93+
# debug = bool
94+
# user_id = string, passed to analytics.track
95+
# write_key = https://segment.com/docs/connections/sources/catalog/libraries/server/python/#getting-started
96+
97+
StorageSegment(
98+
debug=False,
99+
user_id='unknown',
100+
write_key='...',
101+
)
102+
```
103+
104+
```
105+
# StorageCRC - console.redhat.com, using service accounts (put-only)
106+
#
107+
# client_id = METRICS_UTILITY_SERVICE_ACCOUNT_ID
108+
# client_secret = METRICS_UTILITY_SERVICE_ACCOUNT_SECRET
109+
# ingress_url = METRICS_UTILITY_CRC_INGRESS_URL
110+
# proxy_url = METRICS_UTILITY_PROXY_URL
111+
# sso_url = METRICS_UTILITY_CRC_SSO_URL
112+
# verify_cert_path = '/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem'
113+
114+
StorageCRC(
115+
client_id='00000000-0000-0000-0000-000000000000',
116+
client_secret='...',
117+
ingress_url='https://console.redhat.com/api/ingress/v1/upload',
118+
proxy_url=None,
119+
sso_url='https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token',
120+
verify_cert_path='/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem'
121+
)
122+
```
123+
124+
```
125+
# StorageCRCMutual - console.redhat.com, using mutual tls (put-only)
126+
#
127+
# ingress_url = METRICS_UTILITY_CRC_INGRESS_URL
128+
# proxy_url = METRICS_UTILITY_PROXY_URL
129+
# session_cert = ('/etc/pki/consumer/cert.pem', '/etc/pki/consumer/key.pem')
130+
# verify_cert_path = '/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem'
131+
132+
StorageCRCMutual(
133+
ingress_url='https://console.redhat.com/api/ingress/v1/upload',
134+
proxy_url=None,
135+
session_cert=('/etc/pki/consumer/cert.pem', '/etc/pki/consumer/key.pem'),
136+
verify_cert_path='/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem'
137+
)
138+
```
50139

51-
`storage.put(name, data)` - should upload to storage, and retry/raise on failure.
52-
`storage.get(name)` - (context manager) should download from storage into a temporary file, yield the temporary filename, and remove the file again.
53-
54-
The supported `data` formats would still be a local file, an array of local files (crc), or JSON/dict data (crc).
55140

56141
#### Extractors
57142

58143
The opposite of `Package`, an extractor can take a set of files (obtained from storage.get), and read a set of dataframes from them, optionally filtered to select a subset of dataframes to load.
59144

60145
The returned dataframes are raw, but compatible with the `add_*` methods of our named Dataframe classes.
61146

147+
62148
#### Dataframes
63149

64150
A pandas dataframe object with extras - a dataframe always knows about its fields and indexes even when empty,
@@ -68,13 +154,9 @@ and a `to_csv` / `to_parquet` / `to_json` set of methods to convert to storable
68154

69155
A rollup is the process of building a dataframe from raw csv files, and saving the grouped/aggregated result back into a parquet file.
70156

157+
71158
#### Reports
72159

73160
Reports are predefined classes which take a set of dataframes, along with additional config, and create a XLSX file with a specific report. ReportCCSP, ReportCCSPv2 and ReportRenewalGuidance are implemented.
74161

75162
The xlsx file can again be passed to storage.
76-
77-
78-
### helpers
79-
80-
TODO lock temp date

metrics_utility/library/storage.py

Lines changed: 0 additions & 83 deletions
This file was deleted.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
from .crc import StorageCRC, StorageCRCMutual
2+
from .directory import StorageDirectory
3+
from .s3 import StorageS3
4+
from .segment import StorageSegment
5+
6+
7+
__all__ = [
8+
'StorageCRC',
9+
'StorageCRCMutual',
10+
'StorageDirectory',
11+
'StorageS3',
12+
'StorageSegment',
13+
]
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
import json
2+
3+
from importlib.metadata import version
4+
5+
import requests
6+
7+
8+
class Base:
9+
def __init__(self, **settings):
10+
self.ingress_url = settings.get('ingress_url', 'https://console.redhat.com/api/ingress/v1/upload')
11+
self.proxy_url = settings.get('proxy_url')
12+
self.verify_cert_path = settings.get('verify_cert_path', '/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem')
13+
14+
def _session(self):
15+
session = requests.Session()
16+
session.headers = {
17+
'User-Agent': f'metrics-utility {version("metrics-utility")}',
18+
}
19+
20+
session.verify = self.verify_cert_path
21+
session.timeout = (31, 31)
22+
23+
return session
24+
25+
def _proxies(self):
26+
if not self.proxy_url:
27+
return {}
28+
29+
return {'https': self.proxy_url}
30+
31+
def put(self, artifact_name, *, filename=None, fileobj=None, dict=None):
32+
# FIXME: only for .tar.gz
33+
tgz_content_type = 'application/vnd.redhat.aap-billing-controller.aap_billing_controller_payload+tgz'
34+
35+
if filename:
36+
with open(filename, 'rb') as f:
37+
self._put((artifact_name, f, tgz_content_type))
38+
39+
if fileobj:
40+
self._put((artifact_name, fileobj, tgz_content_type))
41+
42+
if dict:
43+
self._put((artifact_name, json.dumps(dict)))
44+
45+
def _put(self, file_tuple):
46+
response = self._request({'file': file_tuple})
47+
48+
# Accept 2XX status_codes
49+
if response.status_code >= 300:
50+
raise Exception(f'{self.__class__.__name__}: Upload failed with status {response.status_code}: {response.text}')
51+
52+
53+
class StorageCRC(Base):
54+
def __init__(self, **settings):
55+
super().__init__(**settings)
56+
57+
self.sso_url = settings.get('sso_url', 'https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token')
58+
self.client_id = settings.get('client_id')
59+
self.client_secret = settings.get('client_secret')
60+
61+
if not self.client_id:
62+
raise Exception('StorageCRC: client_id not set')
63+
64+
if not self.client_secret:
65+
raise Exception('StorageCRC: client_secret not set')
66+
67+
def _bearer(self):
68+
response = requests.post(
69+
self.sso_url,
70+
data={
71+
'client_id': self.client_id,
72+
'client_secret': self.client_secret,
73+
'grant_type': 'client_credentials',
74+
},
75+
headers={'Content-Type': 'application/x-www-form-urlencoded'},
76+
timeout=(31, 31),
77+
verify=self.verify_cert_path,
78+
)
79+
80+
return json.loads(response.content)['access_token']
81+
82+
def _request(self, files):
83+
session = self._session()
84+
85+
access_token = self._bearer()
86+
session.headers['authorization'] = f'Bearer {access_token}'
87+
88+
return session.post(
89+
self.ingress_url,
90+
files=files,
91+
proxies=self._proxies(),
92+
)
93+
94+
95+
class StorageCRCMutual(Base):
96+
def __init__(self, **settings):
97+
super().__init__(**settings)
98+
99+
self.session_cert = settings.get(
100+
'session_cert',
101+
(
102+
'/etc/pki/consumer/cert.pem',
103+
'/etc/pki/consumer/key.pem',
104+
),
105+
)
106+
107+
def _request(self, files):
108+
session = self._session()
109+
110+
# a single file (containing the private key and the certificate)
111+
# or a tuple of both files paths (cert_file, keyfile)
112+
session.cert = self.session_cert
113+
114+
return session.post(
115+
self.ingress_url,
116+
files=files,
117+
proxies=self._proxies(),
118+
)

0 commit comments

Comments
 (0)