Feature: Add the HTML extractor #137

naydav · 2021-05-25T16:18:41Z

Need to add the HTML extractor feature for creating separate records for each p, li, td and code tag.
Could be customized through the nodes_to_index option.

The text was updated successfully, but these errors were encountered:

naydav · 2021-05-25T16:24:57Z

Hi @Haroenv,

According to the recommendation from #134 (comment), it's should be a function that will be directly called in the transformer

But I think we can do it in a more reusable way. We can do it as part of the configuration.

Add one more parameter nodes_to_index: 'p,li,td,code'
to https://github.com/algolia/gatsby-plugin-algolia/blob/master/gatsby-node.js#L42 (like it was with dry-run option)
Add logic for data transforming (will use algolia-html-extractor library)
to https://github.com/algolia/gatsby-plugin-algolia/blob/master/gatsby-node.js#L421
Make the HTML extracting before calling custom transformers

3, It means that the gatsby-plugin-algolia library will get a dependency on html-extractor (https://github.com/mansona/html-extractor)

@Haroenv, please could you share your thoughts about what I describe above. Especially about proper place in the code for HTML extracting

Thanks in advance

Haroenv · 2021-05-25T21:15:31Z

Have you tried running the code manually in transformer? I’m not sure what’s missing with that approach. If it’s urgent you can put the code of the plug-in locally in a first instance (like in the example). Hope that makes sense, and I’m still interested in seeing which approach you take

naydav · 2021-05-26T14:46:50Z

@Haroenv
Yes, the transformer approach works great, but in this case, you will have to duplicate the code each time (in each new transformer declaration).

The main idea was, if it is part of the main library, everyone can use it simply by using the configuration

But we're ok with the transformer. Please accept the resolution as you see fit, and we will begin to work.
Thanks

Haroenv · 2021-05-31T08:29:55Z

As a starter solution the most interesting would be to share the code needed for this on your side, do you have a working example with split & stripped html?

naydav mentioned this issue May 25, 2021

feat: add dryRun option #134

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add the HTML extractor #137

Feature: Add the HTML extractor #137

naydav commented May 25, 2021

naydav commented May 25, 2021 •

edited

Loading

Haroenv commented May 25, 2021

naydav commented May 26, 2021 •

edited

Loading

Haroenv commented May 31, 2021

Feature: Add the HTML extractor #137

Feature: Add the HTML extractor #137

Comments

naydav commented May 25, 2021

naydav commented May 25, 2021 • edited Loading

Haroenv commented May 25, 2021

naydav commented May 26, 2021 • edited Loading

Haroenv commented May 31, 2021

naydav commented May 25, 2021 •

edited

Loading

naydav commented May 26, 2021 •

edited

Loading