Skip to content

ajelenak/ros3vfd-log-info

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ROS3 Log Information Dashboard

notebook app

The HDF5 library (libhdf5) has a virtual file driver for the AWS Simple Storage Service (S3) or any S3-compatible storage system. The driver, called Read-Only S3 (ROS3), reads data from an HDF5 file as S3 object. It can log various information related to its S3 operations since libhdf5 version 1.14.1. This information can be very helpful when deciding which HDF5 file or library features to use for improved data access performance.

This repository contains a simple dashboard for the driver's log data about HTTP range GET requests. These requests represent individual libhdf5 data read operations and directly affect performance. The dashboard is implemented as a Panel web app in a Jupyter notebook. It takes a ROS3 log file and displays statistics and two plots about the HTTP requests performed to read the data. Only log files up to 10 megabytes are accepted due to the current Panel limitation. The easiest way to use the dashboard is via the Binder service links above.

How to Generate ROS3 Logs

The latest library release is always strongly recommended, but since ROS3 logging has been available in several releases by now, instructions for different versions are below. The end goal is the same: capture the logging output in a file and upload it to the dashboard for analysis.

For libhdf5-2.0 or Later

Setting environment variable HDF5_ROS3_VFD_DEBUG to any value other than 0, off, or false will enable logging to stderr. The library must be built with support for the ROS3 driver.

Before libhdf5-2.0

Producing ROS3 logs requires building libhdf5 with the ROS3 driver. Download libhdf5 source from its GitHub repository and follow instructions how to build it with the ROS3 driver.

Enabling ROS3 logging requires modifying a single line of the H5FDs3comms.c file. For libhdf5 versions before 1.14.4 change #define S3COMMS_DEBUG 0 to #define S3COMMS_DEBUG 1. For versions from 1.14.4 change #define S3COMMS_DEBUG 0 to #define S3COMMS_DEBUG 4. After saving the change, build the library with the ROS3 driver according to the instructions. ROS3 driver logging information will be printed to stdout.

Sample ROS3 Logs

Five ROS3 driver log files are in this repository:

The original log files contain S3 requests for an HDF5 file created with default (typical) settings. The optimized log information shows the effects of applying the paged aggregation file space strategy to create a copy of the original file and then using library's page buffer cache when reading data. Note the significantly reduced number of S3 requests in the optimized log files which directly translate into much faster performance.

Since libhdf5-1.14.3, the log files show the feature where the ROS3 driver reads and caches the first 16 MiB on file open. This helps to reduce S3 requests even further for certain use cases. Download these files if you want to use the dashboard without collecting ROS3 logs yourself.

Experimental fsspec logs

Another way to access HDF5 files in the cloud is through fsspec, a Python package that emulates many POSIX operations on remote files. We can save fsspec logs and use the dashboard to analyze their information.

A log from fsspec looks like:

<File-like object S3FileSystem, URL> read: 0 - 8
<File-like object S3FileSystem, URL> read: 8 - 16
...

In order to make the reader compatible we need to inject the file size in the log:

<File-like object S3FileSystem, URL> read: 0 - 8
<File-like object S3FileSystem, URL> read: 8 - 16
<File-like object S3FileSystem, URL> read: 16 - 32
...
FileSize: 736000000

Note: A caveat with fsspec logs is that they do not report cache hits vs real requests, the total requests number is likely less.

Two sample fsspec logs are available in this repository:

Two Ways to Run the Dashboard

Recommended way to run the dashboard is to use either conda or mamba package managers. This repository includes a configuration file to install all the required Python packages:

conda env create --file environment.yml --name VENV_NAME

or

mamba env create --file environment.yml --name VENV_NAME

The dashboard can be run as a typical Jupyter notebook, or a standalone app in a browser with this command:

panel serve ros3vfd-log-info.ipynb --show

About

Simple dashboard for HDF5 library ROS3 virtual file driver log data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •