-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Question
My Requirement is to parse technical documentations, which may contain tables , meaningful images and provide the contents in JSON format(Since it the suggested format for any LLM models ??)
Currently i have
- converted PDF to markdown , Since markdown currently does'nt preserve hierarchy, I have written an helper function which takes markdown file , here based on number in the section name , it sorts things and put everything into JSON format and dump to file
(I hope this is the right approach, if there any better ways please suggest)
Now
2. Here also need to read the tables and convert them to JSON.
3. Take pictures and run it on computer vision model to get extract information about the pictures.
Questions
- what are the available options for this?
- if there are any other better approach please suggest.
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested