TheShrike has asked for the wisdom of the Perl Monks concerning the following question:
What the script did originally is read about 20000+ files, parse them for certain bits of information, and put that into a hash of array of hashes of arrays of hashes so that XML::Simple could output the relevant information into a neat 7.5MB xml file (later loaded into another script).
Now, I am basically trying make a thread for the parsing of each file, with a limit of 10 threads. I am running Perl 5.8.2, using threads and threads::shared. Here is a basic example of what I'm doing:
$hashone = &shared({}); $arrayone = &shared([]); $hashone->{"arraykey"} = $arrayone; for loop{ #some other stuff $hashtwo = &shared({}); push($arrayone, $hashtwo); $arraytwo = &shared([]); $hashtwo->{$filename} = $arraytwo; #add some other values to hashtwo if ($#threads >=9) { $thread = shift(@threads); $thread->join(); undef $thread; } push (@threads, threads->create(\&threadfunction, $filename, $arraytwo +); } # clean up all the other threads, finish as I would without threading sub threadfunction { ($filename, $arraytwo) = @_; open(PIPE, "outputprogram $filename |"); while (<PIPE>) { #add various $hasthree = &shared({}) to $arraytwo #add various values to $hashthree's } close PIPE; }
The undef $thread; I added due to another post I found saying that his threads werent giving up their memory otherwise, which didnt make any sense to him (nor to me). It did allow the program to run longer before running out of memory.
I tried sharing the pointers to the hashes/arrays outside of the thread. That did not help.
My only ideas were:
1. The array/hash/array etc. is taking up too much memory, but it wasnt before I added threads, so this makes no sense.
2. The pipes out of those files are taking up too much memory now that there's 10 of them. But each one is only 10kB of information so that's impossible.
3. Somehow, maybe due to the references not being shared, it was cloning the hash/array/hash etc. for each thread, which would get bigger in the first place and therefore bigger for each new thread as time went on. Except, if it were cloning them, then the original would not be getting much bigger at all. All the pointers are passed in by value anyway, and I have to assume all the values in a shared hash/array are shared (in fact, by the rules, I don't think I could add a non-shared anything to a shared hash/array). The only shared values I create in a thread are the arrays/hashes, and I point to those only in the main shared array/hashes in the parent thread.
I dont know if this problem is solvable, or if its something inherently wrong with perl threading and what I'm trying to do with it, but it would be nice to at least know why it's happening. Any ideas? Should I just leave it as a 30 minute process and code it in Java instead if I actually want it to work?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Threads in Perl: Just leaky?
by BrowserUk (Patriarch) on Oct 04, 2007 at 21:07 UTC | |
|
Re: Threads in Perl: Just leaky?
by zentara (Cardinal) on Oct 04, 2007 at 19:33 UTC | |
|
Re: Threads in Perl: Just leaky?
by Joost (Canon) on Oct 04, 2007 at 20:15 UTC | |
|
Re: Threads in Perl: Just leaky?
by renodino (Curate) on Oct 04, 2007 at 21:19 UTC | |
by chrism01 (Friar) on Oct 05, 2007 at 06:33 UTC | |
by demerphq (Chancellor) on Oct 05, 2007 at 08:54 UTC | |
|
Re: Threads in Perl: Just leaky?
by talexb (Chancellor) on Oct 05, 2007 at 14:32 UTC | |
|
Re: Threads in Perl: Just leaky?
by NiJo (Friar) on Oct 05, 2007 at 18:35 UTC | |
|
Re: Threads in Perl: Just leaky?
by weismat (Friar) on Oct 05, 2007 at 21:05 UTC | |
|
Re: Threads in Perl: Just leaky?
by MarkusLaker (Beadle) on Oct 07, 2007 at 11:00 UTC | |
|
Re: Threads in Perl: Just leaky?
by Anonymous Monk on Oct 05, 2007 at 08:18 UTC |