Recently, an open-source project called gptpdf has gained 1.1k stars on GitHub. It uses a VLLM model similar to GPT-4o to parse PDF files and convert them into Markdown format.
gptpdf Product Entry: https://top.aibase.com/tool/gptpdf
It is learned that the code for this project consists of only 293 lines, yet it can nearly perfectly parse and include content such as formatting, mathematical formulas, tables, images, charts, and more.
The implementation steps of gptpdf are:
1) Use the PyMuPDF library to parse all non-text areas and make appropriate markings (for token saving)
2) Use a multimodal model (such as GPT-4o) for parsing to obtain a Markdown file
It is worth mentioning that the cost of gptpdf is an average of $0.013 per page.
Key Points:
- This open-source project uses a multimodal model similar to GPT-4o to parse PDF files, converting them to Markdown format.
- The project code is concise and efficient, consisting of only 293 lines.
- The parsing results almost perfectly include content such as formatting, mathematical formulas, tables, images, and charts.