I tried with XML::Twig and i got quite good results: see XML::Twig tutorial
use strict; use warnings; use XML::Twig; my $t= XML::Twig->new( pretty_print => 'indented', twig_handlers => { # $_[1] is the elemen +t 'html/body/html' => sub{ $_[1]->print;} }); my $data =<<EOXML; <!DOCTYPE html> <html> <head> <script>/*some ugly header stuff*/</script> </head> <body> <html> <head> <script>/*some embedded document*/</script> </head> <body> <h1>Hello</h1> <p>this is a test</p> <p>this is a second test</p> </body> </html> <p>some kind of wrapped footer</p> </body> </html> EOXML $t->parse( $data); ## output <html> <head> <script>/*some embedded document*/</script> </head> <body> <h1>Hello</h1> <p>this is a test</p> <p>this is a second test</p> </body> </html>
L*
In reply to Re: Parsing incorrect html
by Discipulus
in thread Parsing incorrect html
by seki
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |