Extracting caption of a image from PDF file

ajju has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extracting caption of a image from PDF file by LanX (Saint) on Nov 22, 2010 at 23:49 UTC
see Re: How to extract image captions from a PDF file using perl Cheers Rolf	[reply]
Re: Extracting caption of a image from PDF file by aquarium (Curate) on Nov 23, 2010 at 02:10 UTC
the usual way is to either directly process pdfs in perl with the help of modules (search cpan), but that is fairly involved. so sometimes one would use a utility like xpdf or such to convert pdfs to text or html, and then find a way to extract the required information. it seems you're trying the opposite to the usual tactic, and instead of parsing html (much easier) as per your first post, you've gone to pdf and again ask for code to fall from the sky without showing any programming effort yourself. if you just want the job done and you have no clue about perl or care about programming, then just post it up on some other forum that's geared towards providing free tools/code/advice to support your efforts. this is a perl forum for people interested in perl programming...or at the very least people interested in and actively learning technology. i don't want to disuade you from your efforts, but i do question whether this is the right forum/audience for you. the hardest line to type correctly is: stty erase ^H	[reply]
Re: Extracting caption of a image from PDF file by aquarium (Curate) on Nov 24, 2010 at 01:23 UTC
you can try this..it's java and not perl, but will probably do the job you want, using the original question html; not pdf. http://jsoup.org/ the hardest line to type correctly is: stty erase ^H	[reply]