in reply to Re^2: Module uses loads of CPU.. or is it me
in thread Module uses loads of CPU.. or is it me
Ah I see, the bottleneck is in Net::S3::Amazon, which appears to be using an XPath approach to get at the needed information. Looks like list_all calls list_bucket_all, which calls list_bucket, which does the XPath dirty work.
If I were in your shoes, I might try to write an alternative sub to list_bucket which uses an approach other than XPath. If you look in:
http://search.cpan.org/src/PAJAS/XML-LibXML-1.65/lib/XML/LibXML/XPathC +ontext.pm
sub find calls new for each node it needs to find:
sub find { my ($self, $xpath, $node) = @_; my ($type, @params) = $self->_guarded_find_call('_find', $xpath, $ +node); if ($type) { return $type->new(@params); } return undef; }
This is where the OO interface of XML::LibXML::XPathContext is your bottleneck. You could probably develop a faster interface using a streaming parser, but how much faster I don't know. You'll need some sort of optimization in there to get a faster result. Sorry I can't be of much more help.
UPDATE - you could also use some of the modules lower level functions, and attempt to parallelize the operation by having each cpu count x percent of the buckets. That's probably easier than speeding up the xml parser, and I think it is probably your best bet to get a two or more times speedup. You could fork off a process that writes the results to a temp file, and then add up all the results at the end. I think that may be your shortest course to victory.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Module uses loads of CPU.. or is it me
by hsinclai (Deacon) on Dec 11, 2007 at 13:34 UTC | |
by redhotpenguin (Deacon) on Dec 11, 2007 at 17:22 UTC | |
by hsinclai (Deacon) on Dec 11, 2007 at 18:58 UTC |