in reply to Re: How to parse HTML5?
in thread How to parse HTML5?

Still this problem is not solved in both end. Okay, And i am not wasting my time as well as your time. I just try to solve my problem. If you can then thanks other wise it's okay.

Replies are listed 'Best First'.
Re^3: How to parse HTML5?
by dasgar (Priest) on Mar 10, 2016 at 06:42 UTC

    Out of curiosity, why do you say that the problem has not been solved?

    Prior to your claim that "this problem is not solved in both end", I posted to both forums (here at PerlMonks and here at Stack Overflow) a suggestion to check out HTML::Valid.

    According to the documentation of HTML::Tidy, you need to have tidyp installed first and tidyp appears to be a fork of tidy and that site indicates that it is the "HTML Tidy Legacy Website". The HTML::Valid module is based on the HTML Tidy project and it does support HTML5.

    And I'll take it a bit further. Here's a demonstration of HTML::Valid on the OP's posted HTML/XHTML data.

    I created a test.html file with the following content (from the OP):

    And here's the Perl code that uses HTML::Valid to check that file:

    And here's the output of that script:

    That shows that HTML::Valid is not having issues dealing with <section> tags and that is also provides line numbers and column numbers as the OP stated here as something that was needed. Unfortunately it looks like HTML::Valid does not have an ignore method that was in the OP's code had that used HTML::Tidy, so the OP may need to write a little bit more code to parse out the messages concerning tags that the OP wants to ignore.

    Unless I totally misunderstood what the "problem" was, it looks like HTML::Valid "solves" the "problem".

      Hi

      Is HTML::Valid is available with Active State Perl Because in PPM it does showing?

      Thanks

      Nikhil Ranjan

        I haven't used ActivePerl for at least a few years, so I'm not familiar with what modules are available via ActiveState's repositories.

        If you're using 32-bit ActivePerl and still have access to ActiveState's repository for your version of ActivePerl (i.e. using the latest build of the community version or have purchased support for an older build), you can use PPM to install MinGW and dmake from their repository. After installing both of those, you can then install modules directly from cpan. In this case, the command to run is 'cpan install HTML::Valid'.

        Alternatively, you can also try using Strawberry Perl, which comes with everything needed to install modules directly from CPAN. Also, Strawberry Perl offers a portable version that does not need to be "installed". Just download and extract the zip file and run the portableshell.bat batch script.

        I done it by http://cpansearch.perl.org/src/BKB/HTML-Valid-0.04/README

        Thanks

        Nikhil Ranjan