Profiler

Please see the set of transform project conventions for details on general project conventions, transform configuration, testing and IDE set up.

Summary

This project wraps the profiler transform with a Ray runtime.

Transform runtime

Transform runtime is responsible for creation cache actors and sending their handles to the transforms themselves Additionally it writes created word counts to the data storage (as .csv files) and enhances statistics information with the information about cache size and utilization

Configuration and command line Options

In addition to the configuration parameters, defined here Ray version adds the following parameters:

aggregator_cpu - specifies an amount of CPUs per aggregator actor
num_aggregators - specifies number of aggregator actors

Running

Launched Command Line Options

When running the transform with the Ray launcher (i.e. TransformLauncher), the following command line arguments are available in addition to the options provided by the launcher.

  --profiler_aggregator_cpu PROFILER_AGGREGATOR_CPU
                        number of CPUs per aggrigator
  --profiler_num_aggregators PROFILER_NUM_AGGREGATORS
                        number of agregator actors to use
  --profiler_doc_column PROFILER_DOC_COLUMN
                        key for accessing data

These correspond to the configuration keys described above.

Running the samples

To run the samples, use the following make targets

run-cli-sample - runs src/ededup_transform_ray.py using command line args
run-local-sample - runs src/ededup_local_ray.py
run-s3-sample - runs src/ededup_s3_ray.py
- Requires prior installation of minio, depending on your platform (e.g., from here and here and invocation of make minio-start to load data into local minio for S3 access.

These targets will activate the virtual environment and set up any configuration needed. Use the -n option of make to see the detail of what is done to run the sample.

For example,

make run-cli-sample
...

Then

ls output

To see results of the transform.

Transforming data using the transform image

To use the transform image to transform your data, please refer to the running images quickstart, substituting the name of this transform image and runtime as appropriate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Profiler

Summary

Transform runtime

Configuration and command line Options

Running

Launched Command Line Options

Running the samples

Transforming data using the transform image

Files

README.md

Latest commit

History

README.md

File metadata and controls

Profiler

Summary

Transform runtime

Configuration and command line Options

Running

Launched Command Line Options

Running the samples

Transforming data using the transform image