JayBee has asked for the wisdom of the Perl Monks concerning the following question:

I'm looking for some help with printing our companies estimates (50+ a day) using headers and footers from our website system. The header, footer and body all contain a table with several cells.

Here's a short visual:

----------------------------
|   header   |          |            |  
----------------------------
|   body   |             |            |
|   body   |             |            |
|   body   |             |            |
----------------------------
|   footer       |       |             |
----------------------------

So the problem is that sometime the body overflows to a second page, and I need to be able to control the output, by creating a new header and footer on the next page while the body continues on, and repeat if additional pages are needed.

Now I'm allowed to eliminate the footer on the previous pages, but the header must appear on all, and footer on last page, if not all pages.

I've considered converting the body contents to an image and then reading the image's hight to help estimate the math. but I've had no luck in finding a HTML to Image converter. Closest I've got is a conversion to postscript, and I'd still be lost, since I don't even know what that is.

I do have an option to use GD to convert text to image, but I'm dealing with items that use newline characters and they all need to be fit into a fixed width cell and so a character count will not help me either.

I've found CSS3 options, which sound promising, but just learned that it's not even available yet.

Thanks in advance for any help and guidance.

Replies are listed 'Best First'.
Re: Document Printing Format
by hangon (Deacon) on Aug 27, 2008 at 21:10 UTC

    If the requirement is to print from a web browser, pagination is going to be tricky at best. It takes some work, but here's the technique I use:

    • Output a bare bones html page: no javascript, no layers, no fancy layout tricks to trip up the printer.
    • Use css and either pixels or points (not both) for all size specs and settings.
    • Use a table as a container for each page to be printed. Set css attributes:
      table-layout: fixed and page-break-after: always
    • Specify exact font sizes and line spacing.
    • Set the amount of vertical spacing to be added to table rows (cellpadding, cellspacing, border etc)
    • Using table cell widths and font size, Calculate where to insert line breaks lines to eliminate automatic text wrapping.
    • Keep a running total of the line heights and row spacing. When it approaches the length of a printed page, issue the appropriate html to close out that table, then start the next page by continuing on a new table.

    With a little tweaking, this technique can even be used to precisely align the text onto mailing label sheets when printing through a web browser.

      yes, that makes sense. Sounds simple enough to try this first. The auto wraps was the thing that bugged me. As you said, if I control that, then the rest should be easy.
Re: Document Printing Format
by jethro (Monsignor) on Aug 27, 2008 at 19:22 UTC

    I had do to something similar when I had a script output text, Latex and RTF files of the same data. To find out the page numbers of the RTF I used the text calculation and muliplied with a factor. It was astonishingly exact

    You could just take a measurement of the average size of text and any html formatting that gets used and try to calculate a conservative length. As long as your calculation is more than the real stuff you are on the safe side and the worst that happens is some empty space at the bottom of the body cell

    This is especially easy if you yourself generate the html from the data. Just keep a running counter of row and column as you generate the html.

Re: Document Printing Format
by psini (Deacon) on Aug 27, 2008 at 20:04 UTC

    The task of automatically formatting documents is always a PITA. Using LaTeX, as already suggested, is probably the cleanest way.

    The only real problem in LaTeX approach is that the document templates are to be written in TeX, and it's a skill not so easy to find, particularly when you expect to have several dozens of templates to be done. Cheap. Fast. Hopefully aesthetically valid.

    We are working on another approach that I hope to became, eventually, a CPAN module: a Perl object which takes an ODT file (OpenOffice document) containing a template for the document with markers to be replaced/expanded with data passed to the object in a structure (HoAoH...).

    At present I have implemented conditional blocks, iterative blocks, some aggregate functions on input data, and formatted data substitution. It apparently works rather well but it is really ugly code and certainly not extensively tested.

    If you are interested I can pass it to you, or make it a CUFP node but I'm really ashamed of the code as is now...

    Rule One: "Do not act incautiously when confronting a little bald wrinkly smiling man."

Re: Document Printing Format
by apl (Monsignor) on Aug 27, 2008 at 19:11 UTC
      This was one of the first things I looked at, and it did sound like something I could try, I've actually used perl formats before, but just not for this type of thing. I was just hoping for something cleaner.
Re: Document Printing Format
by JayBee (Scribe) on Aug 30, 2008 at 12:33 UTC

    For all you future perl users who get here:

    I finally found a solution I'm going to use. It comes from further investigation and using Super Search, so check out this post which uses GD::Text::Wrap http://perlmonks.org/?node_id=342952

    So I'm creating a much smaller version of that code into a subroutine and sending text to it, and getting values back so I can increase the counters and adjust accordingly.

    It's beautiful.