-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add the HTML extractor #137
Comments
Hi @Haroenv, According to the recommendation from #134 (comment), it's should be a function that will be directly called in the But I think we can do it in a more reusable way. We can do it as part of the configuration.
3, It means that the @Haroenv, please could you share your thoughts about what I describe above. Especially about proper place in the code for HTML extracting Thanks in advance |
Have you tried running the code manually in transformer? I’m not sure what’s missing with that approach. If it’s urgent you can put the code of the plug-in locally in a first instance (like in the example). Hope that makes sense, and I’m still interested in seeing which approach you take |
@Haroenv The main idea was, if it is part of the main library, everyone can use it simply by using the configuration But we're ok with the transformer. Please accept the resolution as you see fit, and we will begin to work. |
As a starter solution the most interesting would be to share the code needed for this on your side, do you have a working example with split & stripped html? |
Need to add the HTML extractor feature for creating separate records for each
p
,li
,td
and code tag.Could be customized through the
nodes_to_index
option.The text was updated successfully, but these errors were encountered: