-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename the "Intro" notebooks to call out specific functionality it supports (PDF to Embedings) #782
Comments
@sujee Please suggest a couple of names other than intro for this, and after we agree on it, you can submit a PR for the name change. I know you use this example in workshops as an "introductory" notebook, and @Bytes-Explorer 's suggestion of using PDF2Embeddings (its functionality) is a little too "rigid" as a name for an example, so something along the lines of "Run_your_first-pipeline_pdf2embeddings" (seems too long, doesn't it?) is more appropriate. |
@Bytes-Explorer @shahrokhDaijavad how is something along the lines of pdf processing part 1 Totally open to suggestions :) I plan to add other examples along the lines of
|
@sujee I am ok with "PDF processing Part 1", especially if you are planning to add subsequent examples with OCR, Tables, ... and using the PII transform. Of course, the example does a lot more by showing the effectiveness of exact and fuzzy dedup along the way, but we cannot spell out everything in the name. |
It will be nice to understand the functionality from the name. How about PDF processing for RAG? I would also suggest that there should be a readme at the top folder that tells a user what can they learn from every example |
Very good! how about something like..
|
Thanks, @Bytes-Explorer and @sujee. Let's go with pdf_processing_1_for_RAG |
@sujee I just remembered that this example will also change its flow from "chunking" documents and then "deduplicating" chunks to "deduplicating" documents first and "chunking" next, so the PR should be submitted after that change. |
yes. a few changes are going to go into this example. I need to verify a couple of issues I raised to get this functionality (#605) |
No description provided.
The text was updated successfully, but these errors were encountered: