Perhaps you could convert your PDF files to SVG using inkscape, and then parse the resultant SVG using one of the standard XML processing libraries.
Inkscape has a command line mode that can do almost anything that you can do with the GUI.
inkscape -f Input_file.pdf -l Output_file.svgIn reply to Re: How to extract image captions from a PDF file using perl
by chrestomanci
in thread How to extract image captions from a PDF file using perl
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |