menolly has asked for the wisdom of the Perl Monks concerning the following question:

OK, I've read the existing nodes, and tried some of the suggested solutions. Unfortunately, I can't seem to get quite what I need from them. The two ares I'm having difficulty with are line height and pagination.

HTMLDOC, is simple to use, allows us to leverage the current HTML-generating code, and allows pretty good control over pagination, but appears to only allow a constant line height. We're representing fractions using the vertically aligned format. In HTML, 1<br><img src="line.gif"><br>2 generates the desired effect. The generated PDF, unfortunately, appears to maintain a constant line-height, causing excess space between the numbers and the line, like so. I've tried several variations on the HTML; I can move the line around within the space, but I can't seem to get rid of it.

As for pagination, we need to be able to say "This chunk of information should stay together unless it is longer than a page". PDF::API2 appears to allow a fair amount of low level control over text formatting, and might address the previous issue, but I don't see method which would help implement this sort of logic. (For instance, a method to determine how much empty page is left, or how much space a chunk of data will take up.) Nor does it allow us to reuse as much existing code.

Does anyone know HTMLDOC tricks I'm missing? PDF::API2 tips? Other modules to investigate?

Replies are listed 'Best First'.
Re: Dynamically generated HTML to PDF
by ronzomckelvey (Acolyte) on Sep 26, 2003 at 20:47 UTC
    I have a web based application that I built using Perl, that gives the users the ability to download quotes into a PDF. Originally I had a link the user would push to get a detailed quote page as HTML, but then the users always want more.

    So after trying to manually convert the HTML to PDF I went for the easy route, I used htmldoc which gets a web page and converts it to Postscript, then I use Ghostscript (ps2pdf) to make the PDF.

    This worked good for me, cause I then needed to get more then one quote, so with a loop around the htmldoc and keep appending to the same output file, I got a PDf with all my needed pages.

    Code Snippet I used (yea, it's not the best, but it's early code for me):

    $tmp="$TMP_DIR/"."Quotes$$"."$USTATUS{USERNAME}.ps"; $htmp="$HTMP_DIR/"."Quotes$$"."$USTATUS{USERNAME}.pdf"; $tmp2=$tmp; $tmp2=~s/\.ps$/\.pdf/gi; `>$tmp`; my $ctr=0; my $tlog="$Q::all_quotes"; $tlog=~s/:/ /g; &logme('Quotes-PDF', "Getting PDF for quotes $tlog"); print "Processing quotes "; foreach $id (reverse split /:/, $Q::all_quotes) { $ctr++; print " $id "; `htmldoc -t ps --header '' --no-numbered --size letter --bodycolo +r white --left .5in --right .5in --webpage "http://localhost/cgi-bin +/po/index.cgi?session=$SESSION&user=$USTATUS{USERNAME}&qdetail=$id&no +ip=noip" >> $tmp`; } `ps2pdf $tmp $tmp2`;
    Now $tmp2 is my PDF file.
    Another program I use alot for making PDF's is a2ps which is Anything-To-Postscript which is alot easier for converting straight text or reports. I built a magic lp or Samba printer that takes print jobs and converts them to a PDF that then gets emailed back to the user with these methods.

    ronzo

      Unfortunately, the data in question is not, so far, rendering correctly with HTMLDOC -- see the screenshots in my original post. Would a2ps give the control over both line height and pagination that we need? And would it allow us to present the content to the user without generating the file on disk? (This is not a requirement, but is strongly preferred.)
        Hi.. Checking the man a2ps it lets you set the font size, lines per page and all other types of formating, including multiple pages on one physical page.

        I only create the new $tmp2 file for memory sizing issues, this way there's no limit on how big of a file it can convert. I orginally had it all in memory..

        You can deceide if it's good enough to do what you need just from the command line, run the htmldoc with your options till you get it the way you like and make your Postscript file, then ps2pdf (Ghostscript) to make a PDF.

        Perl is the greatest for tying all this stuff together!

        ronzo

Re: Dynamically generated HTML to PDF
by CountZero (Bishop) on Sep 26, 2003 at 22:05 UTC

    Ever thought about using TeX/LaTeX? It can output in different formats including PDF and allows very fine control.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      I looked into it a while back, but found the sheer amount of documentation/information a bit overwhelming -- lots of different macros sets and such. Any pointers for a good place to start?

        Well, TeX is indeed rather difficult, but you do not have to get into every small detail, as there is a nice macro package called LaTeX.

        The "LaTeX User Guide and Reference Manual" by Leslie lamport is a very good start.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law