Skip to content

Simple library for record-level processing using flows of transformations defined as YAML files

License

Notifications You must be signed in to change notification settings

VladimirSiv/pytransflow

Repository files navigation

pytransflow

A simple library for record-level processing using flows of transformations defined as YAML files

Features

The following are some of the features that pytransflow provides:

  • Define processing flows using YAML files
  • Use all kinds of flow configurations to fine-tune the flow
  • Leverage pydantic‘s features for data validation
  • Apply transformations only if defined condition is met
  • Build your own library of transformations
  • Use multiple input and output datasets
  • Ignore specific errors during processing
  • Set conditions for output datasets
  • Track failed records
  • Define flow fail scenarios
  • Process records in parallel
  • Use flow level variables etc.

For more information on these features and how to use them, please refer to the Wiki Page.

Installation

pytransflow is available on PyPI at pytransflow.

pip install pytransflow

Overview

pytransflow lets you process records by defining a flow of transformations. Each flow has its configuration which is defined using YAML files and can be as simple as

description: A simple test flow
instant_fail: True
fail_scenarios:
  percentage_of_failed_records: 90
variables:
  a: B
transformations:
  - prefix:
      field: a
      value: test
      condition: "@a/c/d/e == !:a"
      ignore_errors:
        - output_already_exists
      output_datasets:
        - k
  - add_field:
      name: test/a/b
      value: { "a": "b" }
      input_datasets:
        - k
      output_datasets:
        - x
        - z

Processing is initiated using the Flow class:

from pytransflow.core import Flow
records = [...]

flow = Flow(name="<flow-name>")
flow.process(records)
pprint(flow.datasets)  # End result
pprint(flow.failed_records)  # Failed records

Refer to the Getting Started wiki page for additional examples and guided initial steps or check out the blog post that introduces this library pytransflow.

License

MIT

About

Simple library for record-level processing using flows of transformations defined as YAML files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages