Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add the HTML extractor #137

Open
naydav opened this issue May 25, 2021 · 4 comments
Open

Feature: Add the HTML extractor #137

naydav opened this issue May 25, 2021 · 4 comments

Comments

@naydav
Copy link

naydav commented May 25, 2021

Need to add the HTML extractor feature for creating separate records for each p, li, td and code tag.
Could be customized through the nodes_to_index option.

@naydav
Copy link
Author

naydav commented May 25, 2021

Hi @Haroenv,

According to the recommendation from #134 (comment), it's should be a function that will be directly called in the transformer

But I think we can do it in a more reusable way. We can do it as part of the configuration.

  1. Add one more parameter nodes_to_index: 'p,li,td,code'
    to https://github.com/algolia/gatsby-plugin-algolia/blob/master/gatsby-node.js#L42 (like it was with dry-run option)

  2. Add logic for data transforming (will use algolia-html-extractor library)
    to https://github.com/algolia/gatsby-plugin-algolia/blob/master/gatsby-node.js#L421
    Make the HTML extracting before calling custom transformers

3, It means that the gatsby-plugin-algolia library will get a dependency on html-extractor (https://github.com/mansona/html-extractor)

@Haroenv, please could you share your thoughts about what I describe above. Especially about proper place in the code for HTML extracting

Thanks in advance

@Haroenv
Copy link
Contributor

Haroenv commented May 25, 2021

Have you tried running the code manually in transformer? I’m not sure what’s missing with that approach. If it’s urgent you can put the code of the plug-in locally in a first instance (like in the example). Hope that makes sense, and I’m still interested in seeing which approach you take

@naydav
Copy link
Author

naydav commented May 26, 2021

@Haroenv
Yes, the transformer approach works great, but in this case, you will have to duplicate the code each time (in each new transformer declaration).

The main idea was, if it is part of the main library, everyone can use it simply by using the configuration

But we're ok with the transformer. Please accept the resolution as you see fit, and we will begin to work.
Thanks

@Haroenv
Copy link
Contributor

Haroenv commented May 31, 2021

As a starter solution the most interesting would be to share the code needed for this on your side, do you have a working example with split & stripped html?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants