XML::LibXSLT & --html flag?

Tinkster has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I'm trying to use a perl script to convert ugly Apple Wiki mark-up to something that can easily be re-used with foswiki/twiki ...

I found an XSLT style-sheet that does what I want, and using xsltproc on the command-line it works very well (and quickly!) - the snippet I converted took ~ 4s when using the --html option with it. W/o --html it's 2 min.

Using perl & LibXSLT the same thing takes over 2 minutes, but there seems to be no equivalent of " --html", or at least I can't spot it.

Has anyone used this Module in this way?

Cheers,
Tink

Comment on XML::LibXSLT & --html flag?

Replies are listed 'Best First'.
Re: XML::LibXSLT & --html flag? by Anonymous Monk on May 15, 2012 at 05:38 UTC
See source of HTML::TreeBuilder::LibXML for magic incantation	[reply]
Re^2: XML::LibXSLT & --html flag? by Tinkster (Novice) on May 16, 2012 at 21:30 UTC
Om-mani- ... Thanks. While brute force reading of doco and source didn't do much for me browsing the cpan page for XML::LibXML::Parser just had a break-through result, and I feel stupid for having asked in the first place =o) All it took was to change from `my $source = XML::LibXML->load_xml(location => 'blub.html');` to `my $source = XML::LibXML->load_html(location => 'blub.html');` sigh	[reply] [d/l] [select]
Re: XML::LibXSLT & --html flag? by tobyink (Canon) on May 15, 2012 at 12:30 UTC
Tinkster? Seriously? My name is Toby Inkster. Anyway, the difference in times may be due to DTDs. By default libxml (and libxslt is all libxml-based) downloads DTDs and uses them to expand entities (i.e. convert `é` → `é`). This network activity significantly slows down parsing. LibXML can thankfully be pointed at a local catalogue of DTDs. (See XML::LibXML::Parser and the load_catalog method.) This speeds it up significantly. Also check out my module HTML::HTML5::Parser which (IMHO) parses HTML much better than libxml's built-in HTML parser.) `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]
Re^2: XML::LibXSLT & --html flag? by Tinkster (Novice) on May 15, 2012 at 17:32 UTC
Thanks Toby, Re my nick: that's a long story, doesn't belong here ;] Re the parser: I'm using an xslt sheet to translate some ugly (non-standard) apple wiki HTML(-like) documents to wiki markup, not sure how I'd integrate the HTML::HTML5::Parser with that approach, thanks for the recommendation, anyway. Will have a play with the XML::LibXML::Parser once sanity is restored here. Ta ;) Cheers, Tink	[reply]