graff has asked for the wisdom of the Perl Monks concerning the following question:

One of my colleagues just brought this to my attention: http://blog.whatwg.org/html5lib-010-released

I'm told (by someone I trust as knowledgeable in this area) that HTML 5 is getting a lot of broad support, with Google "contributing heavily". The announcement cited above provides Python and Ruby interfaces to html5lib, but there's no mention of a Perl module (and nothing on CPAN at the moment). This library was being suggested as a very good alternative to something called "BeautifulSoup", which the Python folks I know have been using up to now for parsing HTML data.

I wish I had time... (I've never built a module for this sort of thing before, so it would be an education for me if I did have the time.) Is anyone (planning on) working on this?

  • Comment on Is someone working on html5lib for Perl?

Replies are listed 'Best First'.
Re: Is someone working on html5lib for Perl?
by samtregar (Abbot) on Oct 10, 2007 at 22:03 UTC
    Seems unlikely to me. Perl already has some very good HTML parsers that do everything this library does and more (HTML::Parser and XML::LibXML spring to mind). The only thing missing is HTML v5 support, but I can't imagine why anyone would want that now. I'm sure it will be added to the existing Perl HTML parsers when HTML v5 starts to see some significant real-world use. That seems like a more useful endeavor than wrapping a brand new library.

    But don't let me stop you! If you've never built an XS wrapper around a library I think it's worth doing just to learn how. My book has a chapter explaining how to do it, and you can download a free copy: Writing Perl Modules for CPAN. (Or, you should be able to - seems Apress's free e-book system is down for maintainance...)

    -sam

Re: Is someone working on html5lib for Perl?
by grinder (Bishop) on Oct 11, 2007 at 08:07 UTC

    I would recommend you ask around on the libwww@perl.org mailing list (send a message to libwww-subscribe@perl.org to get onboard). This would appear to me to be the best place to get in touch with people like Gisle Aas who would be likely to work on an html5 parsing library.

    • another intruder with the mooring in the heart of the Perl