in reply to XML::LibXSLT & --html flag?

Tinkster? Seriously? My name is Toby Inkster.

Anyway, the difference in times may be due to DTDs. By default libxml (and libxslt is all libxml-based) downloads DTDs and uses them to expand entities (i.e. convert éé). This network activity significantly slows down parsing.

LibXML can thankfully be pointed at a local catalogue of DTDs. (See XML::LibXML::Parser and the load_catalog method.) This speeds it up significantly.

Also check out my module HTML::HTML5::Parser which (IMHO) parses HTML much better than libxml's built-in HTML parser.)

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Replies are listed 'Best First'.
Re^2: XML::LibXSLT & --html flag?
by Tinkster (Novice) on May 15, 2012 at 17:32 UTC
    Thanks Toby,

    Re my nick: that's a long story, doesn't belong here ;]

    Re the parser: I'm using an xslt sheet to translate some ugly (non-standard) apple wiki HTML(-like) documents to wiki markup, not sure how I'd integrate the HTML::HTML5::Parser with that approach, thanks for the recommendation, anyway.

    Will have a play with the XML::LibXML::Parser once sanity is restored here. Ta ;)

    Cheers, Tink