[Bug]: PDF Parser miss text after OCR

### Is there an existing issue for the same bug?

- [x] I have checked the existing issues.

### RAGFlow workspace code commit ID

3c2c8942d5a14de83429a66cea933aa8050383f2

### RAGFlow image version

3c2c8942d5a14de83429a66cea933aa8050383f2

### Other environment information

```Markdown
Windows 11 Pro
Python 3.10.16
pytorch 12.4
```

### Actual behavior

I use this file 

[layout1.pdf](https://github.com/user-attachments/files/18544578/layout1.pdf)

 to test pdf_parser.py

And then I found that it has missed the word "**_rr_**" after OCR

As you see, my pdf file has rr like:

![Image](https://github.com/user-attachments/assets/d0cfbadd-8212-4271-9e62-04ce4071ed19)

After running self.\__image__ function, the boxes are like:

![Image](https://github.com/user-attachments/assets/657a8144-0311-4b47-9cd4-6592148497cd)

It has missed the word "**_rr_**" after OCR

### Expected behavior

_No response_

### Steps to reproduce

```Markdown
Debug pdf_parser.py with layout1.pdf file(I have put it to Actual behavior) in vscode
```

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: PDF Parser miss text after OCR #4640

Is there an existing issue for the same bug?

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: PDF Parser miss text after OCR #4640

Description

Is there an existing issue for the same bug?

RAGFlow workspace code commit ID

RAGFlow image version

Other environment information

Actual behavior

Expected behavior

Steps to reproduce

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions