Acar has asked for the wisdom of the Perl Monks concerning the following question:

Hi there Monks!

I am trying to convert all html files in a directory using "PDF::FromHTML" and I am getting errors that I can not see the reason why, here is the code and if there is a better way of doing this please let me know:
Error:
not well-formed (invalid token) at line 2, column 0,

Code:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use File::Slurp; use PDF::FromHTML; my $temp_location = "tmp"; my $location = "converted"; my $pdf = PDF::FromHTML->new( encoding => 'utf-8' ); my $all_pdf = get_pdf(); convert_to_pdf($all_pdf); sub get_pdf { opendir DIR, $temp_location; my @files = grep !/^\.{1,2}$/, readdir(DIR); if( @files == 0 ) { print "\n No files at this time.\n\n"; } return \@files; closedir(DIR); } # End Sub sub convert_to_pdf { my $get_files = shift; foreach my $file ( @{$get_files} ) { # Loading from a file: $pdf->load_file($temp_location."/".$file); # Perform the actual conversion:  $pdf->convert(); # Write to a file: $pdf->write_file($location."/".$file); } }# End Sub

Thanks for looking!

Replies are listed 'Best First'.
Re: Converting all files in a directory to PDF
by NetWallah (Canon) on Oct 16, 2014 at 00:53 UTC
    That error is typically a (XML) parsing error.

    It relates to the content of one of your files.

    Check line 2 of the data file.

            "You're only given one little spark of madness. You mustn't lose it."         - Robin Williams

      The format of the .html files been converted to .pdf must follow a certain format? How would I know that?
        "html" files have to follow the rules of "html" syntax. Undefined tags, missing tags, typos,... all these can mess up an html file such that the converter gets confused. Most browsers can work "around" such errors (or rather silently drop these errors and simply hope for the best), but this converter seems more strict. You'd be surprised how few html pages on the web are actually fully compliant with the HTML syntax rules.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
        You need to narrow down the problem to one of your HTML files.

        You either learn and use the perl debugger (perl -d) and put a breakpoint before the failure, then check the file name being processed, or, create a source directory with only one html file at a time.

        It looks like the HTML may be malformed on line 2 of that file. Also check encoding utf8 ?

                "You're only given one little spark of madness. You mustn't lose it."         - Robin Williams