in reply to Re: module for HTML XML conversion
in thread module for HTML XML conversion

my $doc = $parser->parse_fh(\*DATA); # If you have well balanced snippet like below

Is there any reason to not use parse_html_fh to parse HTML? All of the following are well balanced (or impossible to balance) but invalid XML:

<option selected ...>...</option> <font color=red>...</font> <br> <img ...> <meta ...> &nbsp; (without XML bits to define it)

Encoding is also handled differently.

Replies are listed 'Best First'.
Re^3: module for HTML XML conversion
by Your Mother (Archbishop) on Mar 28, 2010 at 05:30 UTC

    No, I think it's a good idea. But the sample input was well balanced so I used it and included the other in (as you point out, the likely) case it wouldn't fly.

    The sample also fails to specially account for the first <div/> which contains the "column" names. We could probably do something automatic with that instead of hard coding the column parsing. The OP might have thought of that already. I didn't until looking it over again now. If I have a few minutes I'll revisit it with that approach.