in reply to Parsing incorrect html

"How would you proceed to get the content of the inner html document?"

This will get you the inner HTML document:

use feature 'say'; use Mojo::DOM; my $html = '<!DOCTYPE html> <html> <head> <script>/*some ugly header stuff*/</script> </head> <body> <html> <head> <script>/*some embedded document*/</script> </head> <body> <h1>Hello</h1> <p>this is a test</p> <p>this is a second test</p> </body> </html> <p>some kind of wrapped footer</p> </body> </html>'; my $dom = Mojo::DOM->new( $html ); say $dom->at('html html')->child_nodes->first->remove;

prints:

<html><head> <script>/*some embedded document*/</script> </head> <body> <h1>Hello</h1> <p>this is a test</p> <p>this is a second test</p> </body> </html>

Mojo::DOM is very powerful, should you wish to extract or manipulate any of the subsequent HTML.