in reply to How to extract image captions from a PDF file using perl

PDF modules on CPAN would probably be a good start. CAM::PDF, iirc, can do that (well, the image part - the caption is iffy). Also see HTML::HTMLDoc. (what was I yammering here?)

--MidLifeXis

  • Comment on Re: How to extract image captions from a PDF file using perl