Your design means that you are (attempting) to store a huge amount of data, thousands of files, into shared memory. This design doesn't make any sense. Shared memory is comparatively slow, because of locking considerations and has high memory usage in order that you can access it from multiple threads. And that is all entirely unnecessary,
Rather than loading the contents of the files in your directory traversal thread and then sharing them with all the other threads (so that just one of them can process each of them), the smarter and more efficient design would be to share the names of the files, and have each processing thread: read a path from the queue; slurp the file; process; and then discard it. This will save huge amounts of shared memory contention and speed your throughput, without exhausting memory.
Here's a quick example that gets the paths of all the .pl files in the sub-tree below the CWD, and processes through them using 4 threads counting the mys they find, before finally totalling the sub-counts and printing out the final tally:
#! perl -slw use strict; use threads; use Thread::Queue; sub worker { my $Q = shift; my $count = 0; while( my $path = $Q->dequeue ) { chomp $path; my $file = do{ local( @ARGV, $/ ) = $path; <> }; $count += () = $file =~ m[my]g; } return $count; } our $THREADS //= 4; my $Q = new Thread::Queue; my @workers = map{ threads->create( \&worker, $Q ); } 1 .. $THREADS; open DIR, '-|', q[ dir /s /b *.pl ] or die $!; $Q->enqueue( <DIR> ); close DIR; $Q->enqueue( (undef) x $THREADS ); my $count = 0; $count += $_->join for @workers; print "Found $count 'my's"; __END__ C:\test>921522-2.pl Found 20825 'my's
In reply to Re: Threads slurping a directory and processing before conclusion
by BrowserUk
in thread Threads slurping a directory and processing before conclusion
by TRoderic
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |