Skip to content

Commit

Permalink
README changes
Browse files Browse the repository at this point in the history
Signed-off-by: SHAHROKH DAIJAVAD <[email protected]>
  • Loading branch information
shahrokhDaijavad committed Dec 10, 2024
1 parent 7836832 commit 55dcce1
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions transforms/language/html2parquet/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,8 +192,8 @@ Chicago |
Run the transform with the following command:

```
python ../html2parquet/python/src/html2parquet_transform_python.py \
--data_local_config "{'input_folder': '../html2parquet/python/test-data/input', 'output_folder': '../html2parquet/python/test-data/expected'}" \
python ./dpk_html2parquet/transform_python.py \
--data_local_config "{'input_folder': './test-data/input', 'output_folder': './test-data/expected'}" \
--data_files_to_use '[".html", ".zip"]'
```

Expand All @@ -202,7 +202,7 @@ python ../html2parquet/python/src/html2parquet_transform_python.py \

### Sample Notebook

See the [sample notebook](../notebooks/html2parquet.ipynb)
See the [sample notebook](./notebooks/html2parquet.ipynb)
) for an example.


Expand All @@ -211,7 +211,7 @@ See the [sample notebook](../notebooks/html2parquet.ipynb)
- [Trafilatura](https://trafilatura.readthedocs.io/en/latest/usage-python.html).
# html2parquet Ray Transform

This module implements the ray version of the [html2parquet transform](https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/html2parquet/python/README.md).
This module implements the ray version of the [html2parquet transform](./dpk_html2parquet/ray/).

The HTML conversion is using the [Trafilatura](https://trafilatura.readthedocs.io/en/latest/usage-python.html).

Expand Down

0 comments on commit 55dcce1

Please sign in to comment.