in reply to HTML input to PDF output

A limited solution (no table or frame support) is the Cookbook's Recipe 20.5:
use strict; use HTML::FormatText; use HTML::Parse; my $data = do {local $/;<DATA>}; my $html = parse_html($data); my $formatter = HTML::FormatText->new( leftmargin => 0, rightmargin => 50, ); my $ascii = $formatter->format($html); print "$ascii\n"; __DATA__ <p class="fol">Here's some text that goes in the body of the article. It has some list items like this:</p> <ul> <li>List item one</li> <li>List item two</li> </ul>
This generates the following output:
Here's some text that goes in the body of the
article. It has some list items like this:

  * List item one

  * List item two

I have found that converting HTML to text is hard, and the best free tool i have found so far is lynx -dump. Of course, the most optimal solution is to never mix presentation with data! :)

Update: in case you are wondering where that extra bullet came from, it is the result of the closing li tags. Looks like HTML::FormatText could use an upgrade to support XHTML. -- good catch Hero Zzyzzx! ;) I fixed this typo since hacker requested i fix the original. For historical purposes, the first list item looked like so: <li>List item one<li>.

jeffa

Remember kids, just say no to mixing data and presentation!

Replies are listed 'Best First'.
Re: (jeffa) Re: HTML input to PDF output
by Hero Zzyzzx (Curate) on Jul 24, 2002 at 16:26 UTC

    Not to niggle, but one of the closing li tags isn't really a closing tag-
    <li>List item one<li>
    Note the second li.

    -Any sufficiently advanced technology is
    indistinguishable from doubletalk.