Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Hi all,

for learning purposes i started to think about how to parse search a very huge file using the multithreading capabilities of Perl.

As i like trivial examples, i started out with something trivial and created some huge file at first:

karls-mac-mini:monks karl$ ls -hl very_huge.file -rw-r--r-- 1 karl karl 2,0G 23 Mai 19:38 very_huge.file karls-mac-mini:monks karl$ tail very_huge.file Lorem ipsum kizuaheli Lorem ipsum kizuaheli Lorem ipsum kizuaheli Lorem ipsum kizuaheli Lorem ipsum kizuaheli Lorem ipsum kizuaheli Lorem ipsum kizuaheli Lorem ipsum kizuaheli Lorem ipsum kizuaheli nose cuke karl karls-mac-mini:monks karl$ wc -l very_huge.file 100000001 very_huge.file

By RTFM i figured out this using MCE::Grep:

#!/usr/bin/env perl use strict; use warnings; use MCE::Grep; use Data::Dump; use Time::HiRes qw (time); MCE::Grep::init( { max_workers => 4 } ); my $start = time; open( my $fh, '<', 'very_huge.file' ); my @result = mce_grep { /karl/ } $fh; close $fh; printf "Took %.3f seconds\n", time - $start; dd \@result; __END__ karls-mac-mini:monks karl$ ./huge.pl Took 29.690 seconds ["nose cuke karl\n"]

Good old grep performs very much better easily:

karls-mac-mini:monks karl$ time grep karl very_huge.file nose cuke karl real 0m2.563s user 0m2.176s sys 0m0.309s

I don't know if this trivial exercise is peinlich parallel, but i'm wondering how to:

  • do this "by hand" (without using MCE::Grep)
  • ...and improve the performance

Thank you very much for any hint and best regards,

Update:

Edit: Striked out nonsense.

Ouch! Perhaps more RTFM would have helped:

PID Prozessname Benutzer % CPU Physikal. Speic Virt. S +peicher 1065 perl karl 12,7 10,3 MB 2, +33 GB 1068 perl karl 83,7 3,9 MB 2, +33 GB 1069 perl karl 84,6 3,9 MB 2, +33 GB 1070 perl karl 83,5 3,9 MB 2, +33 GB 1071 perl karl 84,0 3,9 MB 2, +33 GB

Edit 2: Renamed the thread

Update 3: Many thanks to marioroy and BrowserUk for their patience and their contributions to this interesting thread.

Karl

«The Crux of the Biscuit is the Apostrophe»


In reply to Threads From Hell #2: How To Search A Very Huge File [SOLVED] by karlgoethebier

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-03-29 08:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found