in reply to Re: Testing generated HTML
in thread [Solved] Testing generated HTML

As the page develops, it will have links & the like that I will want to test. I am already testing for things like headings, shown in my OP, so I have been trying the XML approach. I regret to report that I've been getting no farther.

The XML documentation mentions possible problems with HTML, especially with ampersands. The HTML I have so far contains none, but still failed (HTML parser error : Tag nav invalid <nav class="navbar navbar-inverse navbar-fixed-top">). This is something I have cargo culted in from the Bootstrap project. I saved my HTML to file and passed it through validator.w3.org, which reported no errors. I therefore set the "recover" parameter to 2 as suggested by the docs. This led to:

use XML::LibXML; my $parser = XML::LibXML->new(recover => 2); my $xmltree = $parser->parse_html_string($html); my @nodes = $xmltree->getElementsByTagName('h1');

Unfortunately, the @nodes array is empty, even though the tests I have working along the lines of the snippet in my OP are passing and the header is visible in the HTML. I then tried the "reader" module, thus:

use XML::LibXML::Reader; my $reader = XML::LibXML::Reader->new(string => $html, recover => 2); while ($reader->read) { processNode($reader); } sub processNode { my $reader = shift; printf "%d %d %s %s\n", ($reader->depth, $reader->nodeType, $reader->name, $reader->value); }

This starts off well enough, but crashes (I'm showing only the last printed info):

7 8 #comment The above 3 meta tags *must* come first in the head; any + other head content must come *after* these tags Entity: line 21: parser error : Opening and ending tag mismatch: link +line 20 and head </head> ^

I promise you there is no mismatch on the head tag, although there are "meta" and "link" tags between the last reported line and the closing head tag. Again I am having problems with the documentation, as https://metacpan.org/pod/distribution/XML-LibXML/lib/XML/LibXML/Parser.pod gives no information that I can see on how to get data out of the object. I suspect that there are things in the HTML that are beyond the powers of the XML suite even though they are validated. But not being able to see how to check means that I am far from sure.

Any suggestions would be most welcome.

Regards,

John Davies

Replies are listed 'Best First'.
Re^3: Testing generated HTML
by choroba (Cardinal) on Feb 21, 2016 at 20:09 UTC
    Unfortunately, libxml2's HTML Parser doesn't support HTML5. If you want to use XML::LibXML, you need to switch to XHTML.

    XML::LibXML::Reader is a pull parser. It's used to process large XML documents that don't fit into memory. It interpreted the document as XML and didn't find a closing tag for the link element (as it's not needed in HTML). The documentation doesn't mention how to tell it to process HTML instead of XML, but I guess it doesn't support HTML5, either.

    See HTML::HTML5::Parser for an alternative (I haven't tried it myself).

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Unfortunately, libxml2's HTML Parser doesn't support HTML5. If you want to use XML::LibXML, you need to switch to XHTML.

      Another solution might be to switch to Polyglot Markup. This is valid HTML5 which is also well-formed XML, so you get the best of both worlds. It was all the rage a few years back, but you don't seem to see it mentioned much nowadays

        I'm not particularly keen on changing the markup. I needed choroba to tell me that the tools I am using and copying (Dancer2, Template::Toolkit and Bootstrap) are producing HTML5; it's not something I could work out for myself. The purpose of my OP was to get something that would allow me to write tests based on the output of all this. To have a tool in the test framework dictate the content of the page seems to me to be a bad case of the tail wagging the dog. But I'll gladly listen to arguments that I should change.

        Regards,

        John Davies