Skip to content

Conversation

@josephgoksu
Copy link
Contributor

@josephgoksu josephgoksu commented Oct 26, 2025

📋 Description

  • Introduced a new function splitOversizedParagraph to handle paragraphs exceeding the defined section size.
  • Updated splitIntoSections to utilize the new paragraph splitting logic.

🔄 Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🎨 Style/UI changes
  • ♻️ Code refactoring
  • ⚡ Performance improvements
  • 🧪 Test updates
  • 🔧 Build/CI changes

🧪 Testing

  • I have tested this change locally
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

🔍 Code Review Checklist

  • Code follows the project's coding standards
  • Self-review of the code has been performed
  • Code is properly commented, particularly in hard-to-understand areas
  • Changes generate no new warnings
  • Any dependent changes have been merged and published

…section splitting

- Introduced a new function `splitOversizedParagraph` to handle paragraphs exceeding the defined section size.
- Updated `splitIntoSections` to utilize the new paragraph splitting logic.
- Enhanced error handling and logging in the ingestion process for documents and URLs.
- Improved code readability by standardizing formatting and using consistent semicolon usage.
- Adjusted server code to ensure proper response formatting and error handling.
- Updated Docker Compose file for consistency.
@recabasic recabasic requested a review from Copilot October 27, 2025 07:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the document ingestion logic to properly handle oversized paragraphs that exceed section size limits. The main change introduces a new splitOversizedParagraph function that recursively splits large paragraphs at sentence boundaries (or character boundaries for extremely long sentences) to ensure all sections fit within the configured SECTION_SIZE. Additionally, the PR standardizes code formatting across the codebase by adding semicolons and adjusting line length limits.

Key Changes:

  • Introduced splitOversizedParagraph function to handle paragraphs exceeding section size by splitting at sentence boundaries
  • Updated splitIntoSections to utilize the new paragraph splitting logic, preventing oversized sections
  • Enhanced OpenAI embedding API error handling to include response body details

Reviewed Changes

Copilot reviewed 4 out of 7 changed files in this pull request and generated 3 comments.

File Description
backend/src/ingestion/index.ts Added splitOversizedParagraph function and integrated it into splitIntoSections to handle oversized paragraphs
backend/src/embedding/index.ts Enhanced error handling for OpenAI API calls and added dimension validation logic
backend/src/server/index.ts Added request timeout configuration for long-running ingestion requests
.prettierrc.js Increased printWidth from 80 to 160 characters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nullure
Copy link
Member

nullure commented Oct 27, 2025

Hey, thanks for the contribution. Your pull request has conflicts with 2 files. Please resolve it.

@nullure nullure self-assigned this Oct 27, 2025
@josephgoksu
Copy link
Contributor Author

@nullure done

@recabasic recabasic requested a review from Copilot October 28, 2025 06:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 9 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

backend/src/embedding/index.ts:1

  • The openai_model configuration now defaults to an empty string instead of undefined, which may cause issues in resolveOpenAIModel and resolveTargetDimension functions. The function resolveOpenAIModel at line 43-45 uses a truthy check env.openai_model || which treats empty string as falsy, but the type has changed from string | undefined to string. Consider either keeping the original || undefined to maintain the API contract or updating the logic in functions that consume this value to explicitly check for empty strings.
import { env } from '../config';

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@nullure
Copy link
Member

nullure commented Oct 28, 2025

Hey, thanks for the contribution. Your pull request has conflicts with 4 files. Also, please don't vibe code 100% of the code, as it makes the code longer and at one point could really affect the performance.

@josephgoksu
Copy link
Contributor Author

@nullure hey, the PR changes are pretty small. I don't know why you think it's vibe coded at all.

at one point could really affect the performance.

The project fails at large files, URL fetching, simple queries. They all must be fixed. I tried to use in my side project but the results were not even close alternatives

@nullure
Copy link
Member

nullure commented Oct 29, 2025

You still do have conflicts with 4 files.

@josephgoksu josephgoksu force-pushed the refactor/ingestion-bug branch from 61d3803 to 8fa3a99 Compare October 29, 2025 21:21
@josephgoksu
Copy link
Contributor Author

ready to merge @nullure

@josephgoksu
Copy link
Contributor Author

it's mostly formatting changes btw

@nullure
Copy link
Member

nullure commented Oct 30, 2025

Please add tests to prove that it works. Also, can you explain what gemini_queue and other things you added mean? I request you to just modify the ingestion file, we are currently working, and using sudden changes might be a hurdle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants