Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Multi threading a

by dnamonk (Acolyte)
on Sep 02, 2021 at 19:43 UTC ( [id://11136379]=perlquestion: print w/replies, xml ) Need Help??

dnamonk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

The below program is super slow to process a few GB file. I was thinking to do multi-threading to make it faster. Any clues on how I can do it?

Btw, I am reading and writing a gzip file

Thanks in advance.
#!/bin/bash/perl open(FH, $ARGV[0]) || die("Cannot open:"); #open(FH, "gunzip -c $ARGV[0] |") or die "gunzip $ARGV[0]: $!"; while(<FH>){ some regex code }

Replies are listed 'Best First'.
Re: Multi threading a
by Corion (Patriarch) on Sep 02, 2021 at 20:20 UTC

    First, you should compare the processing time of decompressing and recompressing the file without Perl in between:

    time (gunzip -cd "$file" | gzip /tmp/newfile)

    If that is already slow, you might gain some time by using the pigz tool, which uses multiple cores for decompressing and compressing.

    If that is still fast, the processing you do in Perl is the bottleneck and you will need to find ways to make your Perl code faster.

      Yeah, that's true. The issue is with zipping module. Otherwise the program is 20x faster
Re: Multi threading a
by choroba (Cardinal) on Sep 02, 2021 at 19:52 UTC
    What exactly do you want to do in the threads? There's no exhaustive computation to parallelise. Reading from a single file in multiple threads tends to be slower than reading from it in a single thread (not sure about SSD).

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Hello wiser choroba,

      > Reading from a single file in multiple threads tends to be slower

      Are you sure? Is not what MCE::Grep is for?

      ## File path, glob ref, IO::All::{ File, Pipe, STDIO } obj, or scalar +ref ## Workers read directly and not involve the manager process my @e = mce_grep_f { /pattern/ } "/path/to/file"; # efficient

      Also: MCE::Grep#PARSING-HUGE-FILES

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
        Oh, so in each thread, you read a large chunk from the file and then process it line by line in memory! This might work if you're sure your pattern can't be split between two chunks.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Multi threading a
by perlfan (Vicar) on Sep 09, 2021 at 00:49 UTC
    pigz might be just what you want.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11136379]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-19 04:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found