-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] add an example of html2pq in the documentation #788
Comments
Sample code is available here https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/html2parquet/python/src/html2parquet_local.py However, there seems to be some issue with it @sungeunan-ibm When i try to run the test files, I get the below error. Can you pls check?
|
Please also see the issue in this notebook https://github.com/sujee/data-prep-kit/blob/html-processing-1/examples/notebooks/html-processing/2_process_html_python.ipynb |
@Bytes-Explorer I does not look like you setup the environment properly:
|
Yes, right. I will try again after building the environment. This is another reason why we should simplify and everything should happen out of pip install. I am glad we are on that journey. |
@sujee you should be able to use this transform very much like you the pdf2parquet. The only caveat is that they cannot be installed together: either pdf2parquet or html2parquet can be installed in your environment.
|
We seem to be mixing use cases here.
For 1, agreed |
You can use pip install if you want. I was simply responding to your comment as you seem to be trying to run the test example from the test folder |
@sujee @Bytes-Explorer changing this from Bug to Documentation. |
Agree, @sungeunan-ibm could you please update your Readme.md to have an example showing how a notebook user would use your transform ? Please reach out if you need help with this. Thanks |
@sujee Can you attach some sample html to this issue ? Just one or two html files. |
@touma-I this is not tied any particular html input. Just need a sample python code to transform HTML --> MD. |
@sujee I understand. I just need any html |
Here is a sample html (I had to add .txt extension to html file, so I can attach here) |
@sungeunan-ibm When you are back, please see how I did it and let me know if we need to change anything. https://github.com/touma-I/data-prep-kit-pkg/blob/html2parquet-example/transforms/language/html2parquet/notebooks/html2parquet.ipynb cc: @shahrokhDaijavad |
Just add a link to this notebook in html2pq README so it's linked. Great work @touma-I 👏 |
Great! Thank you, @touma-I |
Search before asking
Component
Tools/ingest2parquet
Feature
https://github.com/IBM/data-prep-kit/blob/dev/transforms/language/html2parquet/python/README.md
shows input HTML and output MD. But doesn't have a sample code 😄
We should provide sample code
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: