PDF generation based on the HTML

pavan_pothuru has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Generating the pdf file but the image or content is not aligned to the top left corner. HTML file consists of the images & content. Please find the below code,

open(OUTHTML,"test.html);
@pdf_lines = <OUTHTML>;
close(OUTHTML);

  my $htmldoc = new HTML::HTMLDoc();
#    $htmldoc->set_input_file("$filename");
    $htmldoc->set_page_size('letter');
    $htmldoc->set_top_margin('top',-500,-500);
    $htmldoc->set_bottom_margin('bottom',-500,-500);
    $htmldoc->set_left_margin('left',-400,-400);
    $htmldoc->set_right_margin('right',-400,-400);
    $htmldoc->set_pagemode('pagemode','fullscreen');
    $htmldoc->set_html_content(qq~@pdf_lines~);
    $htmldoc->get_html_content();
    $htmldoc->set_input_file($fl_name_new);
    $htmldoc->get_input_file();
    $htmldoc->set_header('.', '.', '.');
    $htmldoc->set_footer('.', '.', '.');
    $htmldoc->embed_fonts('embedfonts','');
    $htmldoc->no_embed_fonts();
  my $pdf = $htmldoc->generate_pdf();
  #print $pdf->to_string();
  $pdf->to_file("test.pdf");
[download]

Thanks. Regards, Pavan

Comment on PDF generation based on the HTML Download Code

Replies are listed 'Best First'.
Re: PDF generation based on the HTML by Khen1950fx (Canon) on Jan 30, 2010 at 10:25 UTC
I had to clean the script up a little, but it works. It produces a test.pdf that is aligned. For the input file, I used the source for your post: #!/usr/bin/perl use strict; use warnings; use HTML::HTMLDoc; open( 'OUTHTML', '>>', 'test.html' ); my @pdf_lines = <OUTHTML>; close(OUTHTML); my $htmldoc = new HTML::HTMLDoc(); $htmldoc->set_html_content(qq~<html><body>A PDF File</body></html>~); $htmldoc->set_input_file('/user/Desktop/PDF'); $htmldoc->set_page_size('letter'); $htmldoc->set_top_margin( 'top', -500, -500 ); $htmldoc->set_bottom_margin( 'bottom', -500, -500 ); $htmldoc->set_left_margin( 'left', -400, -400 ); $htmldoc->set_right_margin( 'right', -400, -400 ); $htmldoc->set_pagemode( 'pagemode', 'fullscreen' ); $htmldoc->set_html_content(qq~@pdf_lines~); $htmldoc->get_html_content(); $htmldoc->get_input_file(); $htmldoc->set_header( '.', '.', '.' ); $htmldoc->set_footer( '.', '.', '.' ); $htmldoc->embed_fonts( 'embedfonts', '' ); $htmldoc->no_embed_fonts(); my $pdf = $htmldoc->generate_pdf(); $pdf->to_file('test.pdf'); [download]	[reply] [d/l]
Re^2: PDF generation based on the HTML by johngg (Canon) on Jan 30, 2010 at 17:15 UTC
`open( 'OUTHTML', '>>', 'test.html' ); my @pdf_lines = <OUTHTML>;` [download] That looks a bit weird to me but maybe I've missed something. Why are you opening the file for write appending and then trying to read from it. Also, why are you single-quoting the filehandle in the open statement? I don't think that's necessary. Cheers, JohnGG	[reply] [d/l]
Re: PDF generation based on the HTML by ww (Archbishop) on Jan 30, 2010 at 21:02 UTC
On the off chance you're running windows, check the bug reports, including https://rt.cpan.org/Public/Bug/Display.html?id=20797 and https://rt.cpan.org/Public/Bug/Display.html?id=49700. Even Khen1950fx's code above (with the open mode at line 5 changed to: `open( INHTML, "<", "test.htm" or die "can't open test.htm, $!" ) or die "Can't read new2col.html $!" ;` [download] produces a 0 byte pdf from the CLI on w32. Re johngg's response, the doc says: "Normaly this module uses IPC::Open3 for communacation (sic) with the HTMLDOC process. However, in mod_perl-environments (emphasis supplied) there appear problems with this module because the standard-output can not be captured. For this problem this module provides a fix doing the communication in file-mode." "For this you can specify the parameter mode in the constructor: my $htmldoc = new HTMLDoc('mode'=>'file', 'tmpdir'=>'/tmp');" Even if your example is meant to run under mod_perl you may wish to review your line 5. If not, johngg's nudge warrants your careful attention. Your line 13 `$htmldoc->set_input_file('/user/Desktop/PDF');` is also suspect as the doc says: set_input_file($input_filename) this is the function to set the input file name. It will also switch the operational mode to 'file'. which opens "a whole 'nuther kettle of fish" (Ugly ones!) If your target pages don't use css, you may have better luck with http://search.cpan.org/dist/PDF-FromHTML/. Audrey Tang is also the author of another ~~possible option~~, html2pdf.pl (Update: merely a wrapper). Update above: to correct and clarify re `html2pdf.pl`	[reply] [d/l] [select]