Try this out:
my (@dat) = (); my @filters; push @filters, sub { /active/ ? 1 : undef }; push @filters, sub { /anotherfilter/ ? 1 : undef }; open my $DATF, '<', $file_name; while( chomp(my $line = <$DATF>) ) { foreach my $filter (@filters) { my $newline = $filter->($line) or next; push (@dat, $line); last; } } close($DATF);
An alternative is this:
use threads; use Thread::Queue; use constant MAXTHREADS => 2; my $workQueue = Thread::Queue->new(); my $outQueue = Thread::Queue->new(); my @threads = map { threads->new( \&worker ) } 1..MAXTHREADS; open my $DATF, '<', $file_name; while ( <$DATF> ) { $workQueue->enqueue($_); } close $DATF; $workQueue->end(); $_->join for @threads; $outQueue->end(); my @dat; while (my $line = $outQueue->dequeue()) { push @dat, $line; } sub worker { my @filters; push @filters, sub { /active/ ? 1 : undef }; push @filters, sub { /anotherfilter/ ? 1 : undef }; while ( chomp(my $line = $workQueue->dequeue()) ) { foreach my $filter (@filters) { my $newline = $filter->($line) or next; $outQueue->enqueue($line); last; } } }
The benefit to multithreading is you can dial your performance up and down depending on how many resources are available to you. This currently requires you to read the entire file into memory first, however pushing the read process into a separate thread resolves that issue and pushing the outqueue processing into a separate thread also assists in reducing memory footprint (assuming you're doing something like writing the data into a filtered output file)
In reply to Re^2: Process large text data in array
by SimonPratt
in thread Process large text data in array
by hankcoder
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |