PDF modules on CPAN would probably be a good start. CAM::PDF, iirc, can do that (well, the image part - the caption is iffy). Also see HTML::HTMLDoc. (what was I yammering here?)
--MidLifeXis
In reply to Re: How to extract image captions from a PDF file using perl
by MidLifeXis
in thread How to extract image captions from a PDF file using perl
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |