in reply to HTML::Parser question
does anyone know how to make this not combine the words?
Are you sure *it* is combining the words? I think your code is doing that. If your sub gets called multiple times, that is because there were tags in between. You do nothing with those tags, but it is very likely that they were meant to render as some sort of white space.
For formatting HTML as plain text, have a look at HTML::FormatText, or consider using w3m -dump, links -dump or lynx -dump.
A quick and ugly fix for your problem would probably be having start and end handlers that add a single space to the string and a substitution on eof to remove duplicate whitespace.
Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML::Parser question
by mkurtis (Scribe) on Mar 07, 2004 at 20:54 UTC | |
by Juerd (Abbot) on Mar 07, 2004 at 21:36 UTC | |
by graff (Chancellor) on Mar 08, 2004 at 03:03 UTC | |
by mkurtis (Scribe) on Mar 09, 2004 at 00:54 UTC |