Skip to content

Conversation

sachinML
Copy link

Refs #30200

  • Use split_documents(docs) after header-based splitting (preserves per-section metadata; overlap applied per document).
  • Overlap appears only when a single section exceeds chunk_size.
  • Overlap does not cross section/document boundaries.
  • Consider strip_headers=True to avoid a tiny header-only chunk; keep "" as a fallback separator if text lacks newlines/spaces.

@github-actions github-actions bot added documentation Improvements or additions to documentation text-splitters Related to the package `text-splitters` and removed documentation Improvements or additions to documentation labels Oct 13, 2025
@sachinML
Copy link
Author

CI is green; CodeQL is pending. This is a docs-only change. Could a maintainer please trigger the CodeQL scan so this can be merged?

@sachinML sachinML force-pushed the docs/chunk-overlap-troubleshooting branch 6 times, most recently from 88e993b to 5e569a9 Compare October 18, 2025 09:44
@sachinML sachinML force-pushed the docs/chunk-overlap-troubleshooting branch from 5e569a9 to ae92263 Compare October 19, 2025 11:52
Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. I think this is not the right level documentation for this information, can we add to the dedicated docs site here?

You can submit a PR updating the source here.

@ccurme ccurme closed this Oct 21, 2025
@sachinML
Copy link
Author

sachinML commented Oct 21, 2025

Thanks @ccurme, I’ve opened a PR to the docs site, updating the MarkdownHeaderTextSplitter page with the troubleshooting note:
langchain-ai/docs#1061.

If preferred, I can close or revert the README change here and keep the guidance centralized in the docs site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation text-splitters Related to the package `text-splitters`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants