inblosam has asked for the wisdom of the Perl Monks concerning the following question:

I am using RTF::Writer (fantastic module Sean Burke!) to create RTFs in an online document management system. I would like to save the number of pages for files created, based on a standard margin spec (like 1",1",1",1" top,bottom,left,right). It doesn't appear that I can do this with RTF::Writer or from the RTF spec, or from other modules out there (as far as my searching took me). I thought I could calculate it based on the file size, but that isn't really a "nice" or accurate way to do it. Any ideas, even workarounds, are welcome!


Michael Jensen

Replies are listed 'Best First'.
Re: Page number count from RTFs
by tilly (Archbishop) on May 30, 2005 at 22:43 UTC
    I have seen different RTF readers format the same RTF document differently and wind up with different numbers of pages.

    Therefore I don't think that what you want is truly possible.

    Were I in your boat, I'd investigate whether it was possible to pass the RTF through some program that put them in a format where I could see how many pages it had. The page count that you get will be dependent on the program, but should generally be close to reality.

    After a quick google, it looks like you can find rtf2ps here and then grep for %%Page: might work to count pages. (If that converter doesn't work well, then google for another one, there has to be one that works...) As a bonus you can then choose to convert the ps to pdf on the fly to render in web pages. That way if anyone complains that you got the number of pages wrong, you can always point out that different viewers get different numbers of pages, but your count matches what is in your pdf.

    UPDATE: I made it clearer that my untested solution might or might not work.

      Thank you both for the ideas. I may try the RTF to PS to PDF idea. That would at least standardize the number of pages. Thanks for your help! :)


      Michael Jensen
Re: Page number count from RTFs
by jpeg (Chaplain) on May 30, 2005 at 14:25 UTC
    You *could* calculate it from the text itself, but going that route pretty much means you're building your own RTF parser. There are a lot of keywords to pay attention to that can force or delay page breaks, such as tables, footers (destination text), images, individual page margins, varying fonts, kerning, yadda yadda yadda.

    I would guesstimate: you know from the spec you linked that a page defaults to 792 x 612 pts with 72 pt margins on the top and bottom and 90 pt margins on the sides. And you know the font size in points, so you know how many lines can fit on a page. Figure out the length() of the text and guess. It won't be precise, but it may be good enough.

    --
    jpg