Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: During local embedding,RAGFlow is sending too much text at once, exceeding the model's maximum token limit, causing the model to be unable to fully read the input. #4683

Open
1 task done
XTFG opened this issue Jan 30, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@XTFG
Copy link

XTFG commented Jan 30, 2025

Is there an existing issue for the same bug?

  • I have checked the existing issues.

RAGFlow workspace code commit ID

main

RAGFlow image version

v0.15.1,nightly

Other environment information

Actual behavior

When employing embedding models with a lower 'maximum input token' capacity,models such as bge-large and conan-embedding-v1 are limited to a maximum input of 512 tokens. When using these models for embedding, RAGFlow sends more than 512 tokens at once, ollama will encounter an error.
I've found the cause of the error here:https://github.com/ollama/ollama/issues/7288#issuecomment-2591709109
Although I can adjust the maximum input limit of the model in ollama, it will cause RAGFlow's text to be truncated, resulting in incomplete embeddings.Additionally, I'm unable to locate a setting within RAGFlow to control the maximum input for the embedding model.

When adding a model, the max token setting controls the maximum output, not the input, which doesn't apply to embedding models.
Image

The same issue of an ineffective max token option also exists when adding reranker models.

Image

Expected behavior

Please add a setting to RAGFlow to control the maximum number of tokens sent to the embedding model per request, and also fix the bug where the max token limit is ineffective when adding reranker models.

Steps to reproduce

Using the bge-large:latest model in ollama, if the embedding is performed with a method other than 'general' (I am using 'book'), and the token count goes over 512, an error occurs and the embedding is terminated.

Additional information

No response

@XTFG XTFG added the bug Something isn't working label Jan 30, 2025
@KevinHuSh
Copy link
Collaborator

What about reducing chunk token size in chunking method settings?

Image

But it will not slice text apart from the middle of text to ruin the semantics, which is meaningless to embedding.

@XTFG
Copy link
Author

XTFG commented Feb 1, 2025

在分块方法设置中减小块令牌大小怎么样?

Image

但它不会将文本从文本中间切开以破坏语义,这对嵌入毫无意义。

You know, some parsing methods, like QA, resume, manual, table, paper, laws, book, presentation, one, cannot manually set the Chunk token number.

Image

Image

@KevinHuSh
Copy link
Collaborator

We did not depends embedding a lot since its limitation of long text semantics representation.

@XTFG
Copy link
Author

XTFG commented Feb 3, 2025

Could you please tell me what the significance of the 'max output tokens' setting is for embedding and rerank models?

Image

Image

@KevinHuSh
Copy link
Collaborator

Sometimes, if the input is too long, the serving of embedding reports error directly without automatically truncating.

@tusik
Copy link

tusik commented Feb 5, 2025

Sometimes, if the input is too long, the serving of embedding reports error directly without automatically truncating.

Is that possible add OpenAI-API-Compatible configuration options, such as input length, I often encounter an error that exceeds 8196 when using paper mode for word embedding。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants