-
Notifications
You must be signed in to change notification settings - Fork 3
Bge m3 testing #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes introduce a significant refactor and expansion of embedding generation functionality. The previous Gemini API-based embedding approach in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant embeddings_converter.py
participant SentenceTransformer
participant CLIPModel
participant pdfplumber
User->>embeddings_converter.py: main()
embeddings_converter.py->>pdfplumber: extract_text_from_pdf(pdf_path)
embeddings_converter.py->>pdfplumber: extract_tables_from_pdf(pdf_path)
embeddings_converter.py->>pdfplumber: extract_images_from_pdf(pdf_path)
embeddings_converter.py->>SentenceTransformer: get_hf_text_embedding(text/table)
embeddings_converter.py->>CLIPModel: get_clip_image_embedding(image)
embeddings_converter.py-->>User: Log embedding counts and sizes
sequenceDiagram
participant User
participant hf_embedder.py
participant HuggingFaceModel
User->>hf_embedder.py: process_file(file_path)
hf_embedder.py->>HuggingFaceModel: get_text_embedding(text) / get_image_embedding(image)
hf_embedder.py-->>User: Return embedding or error
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
Note 🎁 Summarized by CodeRabbit FreeYour organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Hugging Face Embedder
Summary by CodeRabbit