I am trying to extract content such as the document title or the content text from PDF files (ultimately hoping to search or categorise my collection of PDFs). So far, I have attempted to parse the PDF source file with regular expressions. While I notice that PDF section titles often come with the tag
this does not seem to be the case always - and hence does not constitute a reliable approach for parsing the PDF.
Do you know of any reliable Perl approaches (e. g. suitable modules) for handling PDFs?