How to speed up XML::LibXML toStringC14N - when used on large document?

nneul has asked for the wisdom of the Perl Monks concerning the following question:

I've got a chunk of code that can operate one of two ways:

Snippet of the code (%contacts has values of a XML::LibXML::Node for each contact).

# other code to retrieve the documents into %contacts
foreach my $id ( keys %$contacts ) {
    my $entry = $contacts->{$id};
    my $xml = $entry->toStringC14N();
}
[download]

Retrieve a large number of small XML documents (with say 100 nodes each) - this is slow, as it has lots of separate web requests to retrieve the content

Retrieve a small number of large XML documents (with say 10k-25k nodes each) - this is fast, can retrieve almost all of the content in one or two requests.

Only difference between the two loops is that the nodes in %contacts are either from 1-2 XML::LibXML::Document's, or are from 100+ documents.) Both end up with the same total set of data.

Problem is, when I use the 'large xml document' approach, each toStringC14N() call on is very slow. (1-2/sec)

If I use the small chunk approach, retrieving the data takes a lot longer, but the processing of the nodes runs in the 100-200/sec range or higher.

Is there anything I can do to speed things up when using the large document retrieval, or do I have to just pick a balance between the two extremes?

Comment on How to speed up XML::LibXML toStringC14N - when used on large document? Download Code

Replies are listed 'Best First'.
Re: How to speed up XML::LibXML toStringC14N - when used on large document? by Jenda (Abbot) on Feb 15, 2010 at 15:45 UTC
1. download all the data using a single request 2. split it by xml_split or something to smaller files 3. loop through the tiny files and canonize the contacts. Jenda Enoch was right! Enjoy the last years of Rome.	[reply]