Tinkster has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I'm trying to use a perl script to convert ugly Apple Wiki mark-up to something that can easily be re-used with foswiki/twiki ...

I found an XSLT style-sheet that does what I want, and using xsltproc on the command-line it works very well (and quickly!) - the snippet I converted took ~ 4s when using the --html option with it. W/o --html it's 2 min.

Using perl & LibXSLT the same thing takes over 2 minutes, but there seems to be no equivalent of " --html", or at least I can't spot it.

Has anyone used this Module in this way?

Cheers,
Tink

Replies are listed 'Best First'.
Re: XML::LibXSLT & --html flag?
by Anonymous Monk on May 15, 2012 at 05:38 UTC
      Om-mani- ...

      Thanks. While brute force reading of doco and source didn't do much for me browsing the cpan page for XML::LibXML::Parser just had a break-through result, and I feel stupid for having asked in the first place =o)

      All it took was to change from
      my $source = XML::LibXML->load_xml(location => 'blub.html');
      to
      my $source = XML::LibXML->load_html(location => 'blub.html');

      *sigh*
Re: XML::LibXSLT & --html flag?
by tobyink (Canon) on May 15, 2012 at 12:30 UTC

    Tinkster? Seriously? My name is Toby Inkster.

    Anyway, the difference in times may be due to DTDs. By default libxml (and libxslt is all libxml-based) downloads DTDs and uses them to expand entities (i.e. convert éé). This network activity significantly slows down parsing.

    LibXML can thankfully be pointed at a local catalogue of DTDs. (See XML::LibXML::Parser and the load_catalog method.) This speeds it up significantly.

    Also check out my module HTML::HTML5::Parser which (IMHO) parses HTML much better than libxml's built-in HTML parser.)

    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
      Thanks Toby,

      Re my nick: that's a long story, doesn't belong here ;]

      Re the parser: I'm using an xslt sheet to translate some ugly (non-standard) apple wiki HTML(-like) documents to wiki markup, not sure how I'd integrate the HTML::HTML5::Parser with that approach, thanks for the recommendation, anyway.

      Will have a play with the XML::LibXML::Parser once sanity is restored here. Ta ;)

      Cheers, Tink