perlquestion
karlgoethebier
<p>Hi all,</p>
<p>for learning purposes i started to think about how to <strike>parse</strike> search a very huge file using the multithreading capabilities of Perl.</p>
<p>As i like trivial examples, i started out with something trivial and created some huge file at first:</p>
<c>
karls-mac-mini:monks karl$ ls -hl very_huge.file
-rw-r--r-- 1 karl karl 2,0G 23 Mai 19:38 very_huge.file
karls-mac-mini:monks karl$ tail very_huge.file
Lorem ipsum kizuaheli
Lorem ipsum kizuaheli
Lorem ipsum kizuaheli
Lorem ipsum kizuaheli
Lorem ipsum kizuaheli
Lorem ipsum kizuaheli
Lorem ipsum kizuaheli
Lorem ipsum kizuaheli
Lorem ipsum kizuaheli
nose cuke karl
karls-mac-mini:monks karl$ wc -l very_huge.file
100000001 very_huge.file
</c>
<p>By RTFM i figured out this using [mod://MCE::Grep]:</p>
<c>
#!/usr/bin/env perl
use strict;
use warnings;
use MCE::Grep;
use Data::Dump;
use Time::HiRes qw (time);
MCE::Grep::init( { max_workers => 4 } );
my $start = time;
open( my $fh, '<', 'very_huge.file' );
my @result = mce_grep { /karl/ } $fh;
close $fh;
printf "Took %.3f seconds\n", time - $start;
dd \@result;
__END__
karls-mac-mini:monks karl$ ./huge.pl
Took 29.690 seconds
["nose cuke karl\n"]
</c>
<p>Good old <c>grep</c> performs very much better easily:</p>
<c>
karls-mac-mini:monks karl$ time grep karl very_huge.file
nose cuke karl
real 0m2.563s
user 0m2.176s
sys 0m0.309s
</c>
<p>I don't know if this trivial exercise is <a href="http://en.wikipedia.org/wiki/Embarrassingly_parallel">peinlich parallel</a>, but i'm wondering how to:</p>
<ul>
<li>do this "by hand" (without using <c>MCE::Grep</c>)</li>
<li>...and improve the performance</li>
</ul>
<p>Thank you very much for any hint and best regards,</p>
<p><b>Update:</b></p>
<p><b>Edit: </b>Striked out nonsense.</p>
<strike><p>Ouch! Perhaps more RTFM would have helped:</p>
<c>
PID Prozessname Benutzer % CPU Physikal. Speic Virt. Speicher
1065 perl karl 12,7 10,3 MB 2,33 GB
1068 perl karl 83,7 3,9 MB 2,33 GB
1069 perl karl 84,6 3,9 MB 2,33 GB
1070 perl karl 83,5 3,9 MB 2,33 GB
1071 perl karl 84,0 3,9 MB 2,33 GB
</c></strike>
<p><b>Edit 2</b>: Renamed the thread</p>
<p><b>Update 3: </b>Many thanks to [marioroy] and [BrowserUk] for their patience and their contributions to this interesting thread.</p>
<p>Karl</p>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-1001958">
<p>«The Crux of the Biscuit is the Apostrophe»</p>
</div></div>