Add docs on generating embeddings from web #592

stwiname · 2025-01-29T04:02:35Z

jamesbayly · 2025-02-10T04:06:44Z

docs/ai/build/rag.md


 ```shell
 subql-ai embed-mdx -i ./path/to/dir/with/markdown -o ./db --table your-table-name --model nomic-embed-text
 ```

+### From Web
+
+This will parse all the visible text from the specified web page(s). You can specify the scope for how many links are followed to pull in more data.


How do we scrape these pages, it would be good to provide some details on the libarary we use. And I imagine there are some limitations on dynamic websites, e.g. does this work with websites that need to execute JS.

Finally, how can i verify if this was able to scrape my website, do we export the page content as text somewhere so i can verify this?

Add docs on generating embeddings from web

6c0e086

stwiname requested a review from jamesbayly January 29, 2025 04:02

jamesbayly requested changes Feb 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add docs on generating embeddings from web #592

Add docs on generating embeddings from web #592

Uh oh!

stwiname commented Jan 29, 2025

Uh oh!

jamesbayly Feb 10, 2025

Uh oh!

Uh oh!

Add docs on generating embeddings from web #592

Are you sure you want to change the base?

Add docs on generating embeddings from web #592

Uh oh!

Conversation

stwiname commented Jan 29, 2025

Uh oh!

jamesbayly Feb 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!