PDF parser

### Question

My Requirement is to parse technical documentations, which may contain tables , meaningful images and provide the contents in JSON format(Since it the suggested format for any LLM models ??)

Currently i have
1. converted PDF to markdown , Since markdown currently does'nt preserve hierarchy, I have written an helper function which takes markdown file , here based on number in the section name , it sorts things and put everything into JSON format and dump to file
(I hope this is the right approach, if there any better ways please suggest) 

Now  
2. Here also need to read the tables and convert them to JSON.
3. Take pictures and run it on computer vision model to get extract information about the pictures.

Questions
1. what are the available options for this?
2. if there are any other better approach please suggest.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PDF parser #2085

Question

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PDF parser #2085

Description

Question

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions