Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hai i am using SWISH::Filter module for converting pdf to html

while running a file the following error is comming so please help me to solve this problem

the error is:

3075 Warning - ela: Unknown filter config setting 'user_data' Error: May not be a PDF file (continuing anyway) Error: PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table Error: May not be a PDF file (continuing anyway) Error: PDF file is damaged - attempting to reconstruct xref table... Error: Couldn't find trailer dictionary Error: Couldn't read xref table <html> <head> </head> <body> <pre> </pre> </body> </html>

the code is:

use SWISH::Filter; # load available filters into memory my $filter = SWISH::Filter->new; my $real_path="/vos/spain/test.pdf"; # convert a document my $doc = $filter->convert( document => \$real_path, # path or ref to a doc content_type => 'application/pdf', # content type if doc refer +ence name => 'ela', # optional name for this file (usefu +l for debugging) user_data => $whatever, # optional data to make availab +le to filters ); return unless $doc; # empty doc, zero size, or no filters installed # Was the document converted by a filter? my $was_filtered = $doc->was_filtered; # Skip if the file is not text return if $doc->is_binary; # Print out the doc my $doc_ref = $doc->fetch_doc; print $$doc_ref; # Fetch the final content type of the document my $content_type = $doc->content_type; # Fetch Swish-e parser type (TXT*, XML*, HTML*, or undefined) my $doc_type = $doc->swish_parser_type;

Replies are listed 'Best First'.
Re: How to convert a pdf file in to HTML
by thomas895 (Deacon) on May 15, 2013 at 07:24 UTC

    The problem is this:

    document => \$real_path, # path or ref to a doc

    What is meant with "ref to a doc" is if you already have the contents of a PDF file loaded in a scalar. Then you would pass a reference to it. But you have a path, in which case you need to only pass the scalar, like so:

    document => $real_path, # path or ref to a doc
    ~Thomas~ 
    "Excuse me for butting in, but I'm interrupt-driven..."
      Thank you thomas it is great!
Re: How to convert a pdf file in to HTML
by jakeease (Friar) on May 15, 2013 at 07:40 UTC

    Also,

    3075 Warning - ela: Unknown filter config setting 'user_data'

    That first Warning is because you are using

    user_data    => $whatever,  # optional data to make available to filters

    without setting $whatever. It's optional, so drop it.