Moloch has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am trying to find a way to utilise the full potential of my cpu cores and memory on my windows machine.

Now, I am quite familiar with grep, however, running a Unix based OS is not an option right now.

Unfortunately, the 32 bit grep for windows that I am running, I cannot run multiple instances of it or use the full potential.

Therefore I am trying to create a Perl script using Perl x64 can do the following grep:

   grep -h %search_string% *.txt > retuned_Results.sl

Now, I am having troubles with opening multiple files, one at a time and printing out the line matching the search string.

Any pointers on how I should continue would be greatly appreciated.

Thanks for your assistance.

Replies are listed 'Best First'.
Re: Search through multiple files and output results to new file
by BrowserUk (Patriarch) on Aug 24, 2010 at 16:00 UTC

    Adjust $THREADS to suite your hardware:

    #! perl -sw use strict; use threads; use threads::shared; use Thread::Queue; # grep -h %search_string% *.txt > retuned_Results.sl our $THREADS //= 4; my $semSTDOUT :shared; my $Q = new Thread::Queue; my @threads = map async { my $tid = threads->tid; while( my $file = $Q->dequeue ) { open my $fh, '<', $file or warn ( "$file:$!" ) and next; while( <$fh> ) { next unless m[$ARGV[0]]; lock $semSTDOUT; print; } close $fh; } }, 1 .. $THREADS; $Q->enqueue( glob $ARGV[ 1 ] ); $Q->enqueue( (undef) x $THREADS ); $_->join for @threads; __END__ c:\test>junk23 -THREADS=4 "Queue" *.pl >junk.dat c:\test>head junk.dat use Thread::Queue; my $input_queue = Thread::Queue->new(); my $result_queue = Thread::Queue->new(); use Thread::Queue; my $Q = Thread::Queue->new; use Thread::Queue; my $Q = new Thread::Queue; use Thread::Queue; my $Q = new Thread::Queue; use Thread::Queue;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Search through multiple files and output results to new file
by roboticus (Chancellor) on Aug 24, 2010 at 17:53 UTC

    Moloch:

    I applaud your intent of getting the most out of your machine, but usually file searches are I/O bound, rather than CPU bound. So I'd bet that spreading your I/O among multiple drives would be the best way to improve throughput. Such as: put your output file on one drive, and all your input files split among several other drives. CPUs are so much faster than disk drives that usually the jobs are waiting for a read or write to complete rather than any actual searching.

    ...roboticus

Re: Search through multiple files and output results to new file
by TomDLux (Vicar) on Aug 24, 2010 at 15:39 UTC

    I got a Perl 'grep' using:

    perl -n -e'print if /pod2usage/' *

    '-n' tells perl to process all the files on the command line, line by line, but not print them. '-p' does the same them but prints them ... useful for altering data, like you might with sed.

    '-e' specifies an expression to evaluate on each line. It prints the value of '$_' if '$_' matches the regex, /pod2usage/, i.e., if that string appears anywhere in the line.

    I'm running on Linux, command line adjustments for Windows are left as an exercise for the student :-)

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Re: Search through multiple files and output results to new file
by JavaFan (Canon) on Aug 24, 2010 at 16:09 UTC
    Unfortunately, the 32 bit grep for windows that I am running, I cannot run multiple instances of it or use the full potential.
    I don't much about Windows, but I'm baffled by this statement. Why can't you run multiple instances of it?
    Therefore I am trying to create a Perl script using Perl x64 can do the following grep:
    grep -h %search_string% *.txt > retuned_Results.sl
    Uhm, that's just one instance. Which, AFAIK, on Unix, doesn't use multiple cores either. And why doesn't it do what you want under Windows?

    Of course, the Perl equivalent of the above is:

    perl -ne 'print if /%search_string%/' *.txt > retuned_Results.sl
    but if the grep doesn't do what you want, I fear the above doesn't do what you want either.
      Why can't you run multiple instances of it?

      Two reasons:

      1. Each instance would be trying to process the same files.

        Which would mean lots of "file in use" errors and/or duplicated results.

      2. The multiple instances would be trying to redirect their output to the same file.

        Even if you use append (>>), Windows won't let you do that. Not sure if *nix will?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Each instance would be trying to process the same files.
        Oh. I never got the impression that the OP wanted this. I fail to see why that's a benefit.
        Which would mean lots of "file in use" errors and/or duplicated results.
        Windows doesn't allow two processes to open the same file? That sounds like an easy DoS.
        The multiple instances would be trying to redirect their output to the same file. Even if you use append (>>), Windows won't let you do that. Not sure if *nix will?
        Unix allows multiple processes to write to the same file. And if the file is opened in append mode, all write()s will go to the end of the file.
Re: Search through multiple files and output results to new file
by Anonymous Monk on Aug 24, 2010 at 15:34 UTC