snellm has asked for the wisdom of the Perl Monks concerning the following question:

Has anybody got highlight in PDF files working, preferably in a UNIX/Perl environment?

The procedure for passing an XML file to the Acrobat Reader seems straightforward enough, but generating the XML file is tricky - I can't find any tools that are able to calculate page numbers and character offsets given a PDF file and a set of keywords.

-- Michael Snell
-- michael@snell.com

Replies are listed 'Best First'.
Re: PDF search term highlighting
by traveler (Parson) on Nov 19, 2002 at 16:40 UTC
    You should look at xpdf. It contains pdf2txt that converts pdf to text. This is used by the python tool pdfSearch that seems to come close to what you want.

    HTH, --traveler

      I'm not sure this is useful - I already use pdf2txt in another context.

      The problem is that I need to know the page number and offset (ie nth char) of the words to highlight. pdf2txt doesn't retain this information - it simply returns all the text in the PDF.

      -- Michael Snell
      -- michael@snell.com

        I know that pdf2txt only outputs the text. Absent another solution, though, that code may for the basis for a perl module you could write that would preserve the necessary information.

        --traveler

Re: PDF search term highlighting
by TheHobbit (Pilgrim) on Nov 19, 2002 at 17:59 UTC

    Hi,
    There are realy a lot of modules to handle XML input/output... I think you may have a look at XML::Parser.

    Hoping this helps...

    Cheers


    Leo TheHobbit
Re: PDF search term highlighting
by snellm (Monk) on Nov 22, 2002 at 12:18 UTC
    Perhaps I didn't phrase the question correctly: I have no problem with XML per se - the problem is that I don't know how to find the page number and offset of a given keyword in a PDF file.

    -- Michael Snell
    -- michael@snell.com