You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: metrics_utility/library/README.md
+94-12Lines changed: 94 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,7 @@ This is a Python library for metrics-utility, exposing all the functionality in
4
4
5
5
It provides an abstraction over collectors, packaging and storage, extraction, rollups, dataframes and reports, as well as helper functions for tempdirs, locking, and datetime handling.
6
6
7
+
7
8
### Abstractions
8
9
9
10
#### Collector
@@ -23,6 +24,7 @@ should raise an exception when passed invalid values or a bad DB connection, but
23
24
24
25
Files created by collectors are only cleaned up when called by Package, otherwise rely on having been created inside a per-job tempdir (see helpers), which then gets cleaned up.
25
26
27
+
26
28
#### Package
27
29
28
30
When multiple collectors are called, or the same collector is called multiple times, they are independent of each other.
@@ -37,28 +39,112 @@ For grouping things together, we have a Package class, which takes a list of ini
37
39
38
40
Such tarball can then be passed to a Storage class, and gets cleaned up afterwards.
39
41
42
+
40
43
#### Storage
41
44
42
45
Storage objects serve to provide a shared interface for various storage modes. Each can be initialized with an appropriate configuration, and can retrieve or save objects from/to long-term storage.
43
46
44
47
Mainly S3 and local directories are supported,
45
48
but the Storage mechanism can also be used to push the data to cloud APIs or to save it in a local DB.
46
49
47
-
*`StorageDirectory(base_path='./')`
48
-
*`StorageS3(...bucket, auth, server...)`
49
-
*`StorageCRC(...server, auth...)`
50
+
Common API:
51
+
52
+
*`storage.put(name, ...)` - should upload to storage, and retry/raise on failure.
53
+
*`storage.put(name, dict=data)` - uploads a dict, likely as json data, or a .json file
54
+
*`storage.put(name, filename=path)` - uploads a local file (by name)
55
+
*`storage.put(name, fileobj=handle)` - uploads an opened local file or a compatible object (by a file-like handle)
56
+
*`storage.get(name)` - (context manager) should download from storage into a temporary file, yield the temporary filename, and remove the file again.
57
+
58
+
Also supported - `exists(name) -> Bool`, `remove(name)`, `glob(pattern) -> [filenames]`.
59
+
60
+
Implemented storage classes:
61
+
62
+
```
63
+
# StorageDirectory - local directory structure under base_path
64
+
#
65
+
# base_path = METRICS_UTILITY_SHIP_PATH
66
+
67
+
StorageDirectory(
68
+
base_path='./',
69
+
)
70
+
```
71
+
72
+
```
73
+
# StorageS3 - S3 or minio
74
+
#
75
+
# bucket = METRICS_UTILITY_BUCKET_NAME
76
+
# endpoint = METRICS_UTILITY_BUCKET_ENDPOINT
77
+
# region = METRICS_UTILITY_BUCKET_REGION
78
+
# access_key = METRICS_UTILITY_BUCKET_ACCESS_KEY
79
+
# secret_key = METRICS_UTILITY_BUCKET_SECRET_KEY
80
+
81
+
StorageS3(
82
+
bucket='name',
83
+
endpoint='http://localhost:9000', # or 'https://s3.us-east.example.com'
`storage.put(name, data)` - should upload to storage, and retry/raise on failure.
52
-
`storage.get(name)` - (context manager) should download from storage into a temporary file, yield the temporary filename, and remove the file again.
53
-
54
-
The supported `data` formats would still be a local file, an array of local files (crc), or JSON/dict data (crc).
55
140
56
141
#### Extractors
57
142
58
143
The opposite of `Package`, an extractor can take a set of files (obtained from storage.get), and read a set of dataframes from them, optionally filtered to select a subset of dataframes to load.
59
144
60
145
The returned dataframes are raw, but compatible with the `add_*` methods of our named Dataframe classes.
61
146
147
+
62
148
#### Dataframes
63
149
64
150
A pandas dataframe object with extras - a dataframe always knows about its fields and indexes even when empty,
@@ -68,13 +154,9 @@ and a `to_csv` / `to_parquet` / `to_json` set of methods to convert to storable
68
154
69
155
A rollup is the process of building a dataframe from raw csv files, and saving the grouped/aggregated result back into a parquet file.
70
156
157
+
71
158
#### Reports
72
159
73
160
Reports are predefined classes which take a set of dataframes, along with additional config, and create a XLSX file with a specific report. ReportCCSP, ReportCCSPv2 and ReportRenewalGuidance are implemented.
0 commit comments