in reply to Extracting caption of a image from PDF file

you can try this..it's java and not perl, but will probably do the job you want, using the original question html; not pdf. http://jsoup.org/
the hardest line to type correctly is: stty erase ^H
  • Comment on Re: Extracting caption of a image from PDF file