Skip to content

A C++ template library that enables functional-style data processing through composable pipelines. The library emphasizes lazy evaluation and memory efficiency, processing data elements on-demand without intermediate allocations wherever possible.

Notifications You must be signed in to change notification settings

Gtollm/Pipeline-Adapters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pipeline Adapters

A header-only C++ library designed to simplify algorithms, container manipulation, and file processing through a unified, pipeable interface.

It allows you to express complex data transformations as a linear sequence of operations. Unlike standard nested function calls, FlowPipes uses the | operator to chain adapters, resulting in readable, declarative code. Most adapters are lazy, meaning they generate values only when iterated over, ensuring constant memory usage ( $O(1)$ ) for streams of arbitrary size.

Key Features

  • Declarative Syntax:
  • Lazy Evaluation:
  • Memory Efficient:
  • Range-Based Compatibility:
  • Zero-Copy Design:

Usage Example

The following example demonstrates how to count word frequencies across all .txt files in a directory recursively.

#include <iostream>
#include <format>
#include <algorithm>

#include "processing.hpp"

int main(int argc, char* argv[]) {
    using namespace FlowPipes;

    std::string path = (argc > 1) ? argv[1] : ".";
    bool recursive = true;

    Dir(path, recursive)
        // 1. Filter only .txt files
        | Filter([](const std::filesystem::path& p){ return p.extension() == ".txt"; })

        // 2. Open files and stream content
        | OpenFiles()

        // 3. Tokenize by newlines, spaces, and punctuation
        | Split("\n ,.;")

        // 4. Normalize to lowercase
        | Transform([](std::string token) {
            std::transform(token.begin(), token.end(), token.begin(),
                           [](unsigned char c){ return std::tolower(c); });
            return token;
        })

        // 5. Count occurrences (Eager operation)
        | AggregateByKey(
            size_t{0},                                      // Initial value
            [](const std::string&, size_t& count) { ++count; }, // Aggregator
            [](const std::string& token) { return token; }      // Key selector
          )

        // 6. Format output
        | Transform([](const std::pair<std::string, size_t>& stat) {
            return std::format("{} - {}", stat.first, stat.second);
        })

        // 7. Write to stdout
        | Out(std::cout);

    return 0;
}

Library Components

Generators & Sources

  • AsDataFlow: Converts a standard STL container into a data stream.
  • Dir: Recursively iterates through files in a directory.

Stream Adapters (Lazy)

  • Transform: Applies a function to every element in the stream.
  • Filter: Passes only elements that satisfy a predicate.
  • Split: Tokenizes string inputs based on a set of delimiters.
  • OpenFiles: Takes file paths as input and streams their contents line-by-line or chunk-by-chunk.
  • DropNullopt: Filters a stream of std::optional<T>, unwrapping values and discarding std::nullopt.
  • SplitExpected: specialized adapter for std::expected. It branches the pipeline to handle valid results and errors separately.

Aggregators & Joins (Eager)

  • AggregateByKey: Groups data by a key and performs a reduction operation (e.g., counting, summing). Note: This operation requires internal storage.
  • Join: Performs an operation similar to a SQL LEFT JOIN between two data streams.
  • AsVector: Materializes the entire stream into a std::vector.

Sinks

  • Out: Writes the stream to an std::ostream (e.g., std::cout or a file stream).
  • Write: Iterates through the stream and writes elements to a stream with specified delimiters.

Technical Details

Memory Management

The library is designed around the concept of non-owning views. Adapters generally do not own the data they process, allowing for lightweight composition.

Requirements

  • C++20
  • Google Test (for running the test suite).

Building and Testing

The project uses CMake for build configuration.

mkdir build && cd build
cmake ..
make
./tests

About

A C++ template library that enables functional-style data processing through composable pipelines. The library emphasizes lazy evaluation and memory efficiency, processing data elements on-demand without intermediate allocations wherever possible.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •