Recently, an open-source project called gptpdf has gained 1.1k stars on GitHub. It uses a VLLM model similar to GPT-4o to parse PDF files and convert them into Markdown format.gptpdf Product Entry: https://top.aibase.com/tool/gptpdf
It is learned that the code for this project consists of only 293 lines, yet it can nearly perfectly parse and include content such as formatting, mathematical formulas, tables, images, charts, and more. The implementation steps of gptpdf are:
1) Use the PyMuPDF libr