in reply to Re: Process large text data in array
in thread Process large text data in array
Try this out:
my (@dat) = (); my @filters; push @filters, sub { /active/ ? 1 : undef }; push @filters, sub { /anotherfilter/ ? 1 : undef }; open my $DATF, '<', $file_name; while( chomp(my $line = <$DATF>) ) { foreach my $filter (@filters) { my $newline = $filter->($line) or next; push (@dat, $line); last; } } close($DATF);
An alternative is this:
use threads; use Thread::Queue; use constant MAXTHREADS => 2; my $workQueue = Thread::Queue->new(); my $outQueue = Thread::Queue->new(); my @threads = map { threads->new( \&worker ) } 1..MAXTHREADS; open my $DATF, '<', $file_name; while ( <$DATF> ) { $workQueue->enqueue($_); } close $DATF; $workQueue->end(); $_->join for @threads; $outQueue->end(); my @dat; while (my $line = $outQueue->dequeue()) { push @dat, $line; } sub worker { my @filters; push @filters, sub { /active/ ? 1 : undef }; push @filters, sub { /anotherfilter/ ? 1 : undef }; while ( chomp(my $line = $workQueue->dequeue()) ) { foreach my $filter (@filters) { my $newline = $filter->($line) or next; $outQueue->enqueue($line); last; } } }
The benefit to multithreading is you can dial your performance up and down depending on how many resources are available to you. This currently requires you to read the entire file into memory first, however pushing the read process into a separate thread resolves that issue and pushing the outqueue processing into a separate thread also assists in reducing memory footprint (assuming you're doing something like writing the data into a filtered output file)
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Process large text data in array
by BrowserUk (Patriarch) on Mar 11, 2015 at 15:17 UTC | |
by SimonPratt (Friar) on Mar 11, 2015 at 16:01 UTC | |
by BrowserUk (Patriarch) on Mar 11, 2015 at 17:24 UTC | |
by SimonPratt (Friar) on Mar 11, 2015 at 18:12 UTC |