in reply to Converting Word97 (or later) exported HTML to valid HTML

Honestly, as I read the title of your node, HTML tidy sprang immediately to my mind, as it even has command line switches used to specifically clean up Office HTML. On that website, there is also code on how to call HTML tidy from Perl, including some proposed error checking which seems mostly geared for Unix. On the second thought, it is not really clear why they use the code they use, so I'll post it here, together with my replacement :
## This is what I think is needed beforehand : open( TIDY, "html-tidy $commandline|") or die "Couldn't spawn html-tid +y : $!\n"; my @output; @output = <TIDY>; ## Here begins their code : if (close(TIDY) == 0) { my $exitcode = $? >> 8; if ($exitcode == 1) { printf STDERR "tidy issued warning messages\n"; } elsif ($exitcode == 2) { printf STDERR "tidy issued error messages\n"; } else { die "tidy exited with code: $exitcode\n"; } } else { printf STDERR "tidy detected no errors\n"; }
I think this could simply be done with the following code, but I haven't checked all possible outcomes...
my @output = qx(html-tidy $commandline); my $exitcode = $? >> 8; if ($exitcode == 1) { printf STDERR "tidy issued warning messages\n"; } elsif ($exitcode == 2) { printf STDERR "tidy issued error messages\n"; } else { die "tidy exited with code: $exitcode\n"; }

Wrapping it up, unless you tell us a really convincing reason why html-tidy is not possible (and with not possible I also mean putting html-tidy into a Perl script, writing it out to /tmp, starting it there and afterwards deleting the file again), I'll stick with this solution :-)

perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web