in reply to How to extract image captions from a PDF file using perl
PDF modules on CPAN would probably be a good start. CAM::PDF, iirc, can do that (well, the image part - the caption is iffy). Also see HTML::HTMLDoc. (what was I yammering here?)
--MidLifeXis
|
|---|