go ahead... be a heretic | |
PerlMonks |
Module uses loads of CPU.. or is it meby hsinclai (Deacon) |
on Dec 10, 2007 at 02:20 UTC ( [id://656032]=perlquestion: print w/replies, xml ) | Need Help?? |
hsinclai has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
I'm using Net::Amazon::S3 to get a listing of a bunch of files (under 30K files right now, but soon to be around 150K). I only want to grab the number of files/dirs and add up the bytes these take up.. It works fine but takes several minutes to run, during which time it appears to take up about 46MB of RAM (I have 4GB on this box). But the CPU gets slammed at 100% the whole time (actually, one core gets pegged). here's the only loop I have, that adds up the bytes. The module builds an array as a value inside a hash (I believe) and it also uses LWP and XML modules among others behind the scenes (I believe) the answer looks like this: Do you think this process can be made less CPU-intensive somehow? It seems as if the module is going to build the answer list in an array before you get the chance to do anything else like either keep it in memory or write it out to a file. Oh yeah one thing - the files are in pretty deep directory structures - perhaps xml parsing is the culprit for CPU usage due to the many nested levels - how would I confirm this? Many thanks, Harold Update: Thanks kyle for the suggestion to profile, which I did, and looks like my suspicion that XML related acitvities take most of the cycles here might be confirmed: So, this ran for about 35 minutes, and unfortunately crapped out with a parser error :2: parser error : xmlParseCharRef: invalid xmlChar value 8 I'm going to assume this is because the script ran while a file upload was taking place, and some of the returned records might not have been complete. Profiling the script pointed at another much smaller Amazon bucket, however, yeilds the same proportion of results -- that is -- XML::LibXML::NodeList::new and XML::LibXML::Literal::new each take 48% or more of the runtime... So this brings me back to the original question:) - can any kind soul suggest any way to improve performance -- using threads would not enable me to put the idle CPU core to use would it? Or... Thanks once again -Harold Update 2: Someone suggested changing from a foreach to a while in my function but as can be seen from the profiling run in the OP, most of the time (and also CPU, I would guess) is being spent within the XML modules used by the AmazonS3 module to build the data structure. The function with the foreach loop isn't even called until the AmazonS3 module finishes getting and building its data, and doesn't even appear in the top 15 functions -H
Back to
Seekers of Perl Wisdom
|
|