Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, my codes run but it is extremely slow. any suggestion to improve it?
use strict; use XML::Parser; use XML::XPath; use Lingua::StopWords qw( getStopWords ); my $stopwords = getStopWords('en'); my $file = $ARGV[0]; my $xp = XML::XPath->new(filename=>$file); for (my $n = 1; $n <= 600; $n++) { my $textnodeset = $xp->find('//pair[@id = '.$n.']/tAnnotation/tree/nod +e/word/attribute[@name="token"]'); my @texts; if (my @textnodelist = $textnodeset->get_nodelist) { @texts = map($_->string_value, @textnodelist);} my %seent; my @uniqt = grep !$seent{$_}++, @texts; my $hyponodeset = $xp->find('//pair[@id = '.$n.']/hAnnotation/tree/nod +e/word/attribute[@name="token"]'); my @hypos; if (my @hyponodelist = $hyponodeset->get_nodelist) { @hypos = map($_->string_value, @hyponodelist);} my %seenh; my @uniqh = grep !$seenh{$_}++, @hypos; my @termst = grep { ! $stopwords->{ $_ } } @uniqt; my @termsh = grep { ! $stopwords->{ $_ } } @uniqh; for my $i (0 .. $#termst) { for my $j ( 0 .. $#termsh) { print "$termst[$i] $termsh[$j]\n"; }} }
my input is an xml file and my output is a pairs of extracted items per line. any idea to improve? thanks,.

Replies are listed 'Best First'.
Re: my xml xpath is too slow.
by mirod (Canon) on Aug 26, 2009 at 09:22 UTC

    The first suggestion that comes to my mind is to ditch XML::XPath and replace it with XML::LibXML. That will speedup your code, and you will be using a module that is actually maintained.

    Then of course the //pair[@id = '.$n.'] is a big red flag. Especially as you seem to be doing this twice for each $n (it's hard to tell as your code is not indented). Either try to have the complete path above pair, or do the search on //pair only once, cache the results in a hash id => node and use this in the rest of your code. Does this make sense?

      Thanks for reply, I never used LibXML, do you think it would work faster? I have to say with the full path for //pair ( /e-c/pair instead of //pair), it did not help for speed. Moreover since it's pair I have to extract it in to different process, as it is in my code.

        XML::LibXML would certainly run much faster, XML::XPath is quite slow. and the code would be very similar. I am surprised that removing the // did not speed up the code though.

Re: my xml xpath is too slow.
by grizzley (Chaplain) on Aug 26, 2009 at 14:27 UTC

    Wouldn't it be easier to write it (and read it) this way?:

    use strict; use XML::Parser; use XML::XPath; use Lingua::StopWords qw( getStopWords ); my $stopwords = getStopWords('en'); my $file = $ARGV[0]; my $xp = XML::XPath->new(filename=>$file); for my $n(1..600) { my %seent; my $textnodeset = $xp->find('//pair[@id = '.$n.']/tAnnotation/tree +/node/word/attribute[@name="token"]'); my @texts = map $_->string_value, $textnodeset->get_nodelist; my @termst = grep{!$seent{$_}++ && !$stopwords->{$_}} @texts; my %seenh; my $hyponodeset = $xp->find('//pair[@id = '.$n.']/hAnnotation/tree +/node/word/attribute[@name="token"]'); my @hypos = map $_->string_value, $hyponodeset->get_nodelist; my @termsh = grep{!$seenh{$_}++ && !$stopwords->{$_}} @hypos; for my $i (@termst) { for my $j (@termsh) { print "$i $j\n" } } }

    You could search google for 'xpath unique', so that xpath returns only unique values. But I am not sure if it will speed things up or slow them down...

Re: my xml xpath is too slow.
by Jenda (Abbot) on Sep 03, 2009 at 21:44 UTC

    Hard to say without seeing the XML, but I would use XML::Rules to trim and simplify the data before entering the look.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.