Search through multiple files and output results to new file

Moloch has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Search through multiple files and output results to new file by BrowserUk (Patriarch) on Aug 24, 2010 at 16:00 UTC
Adjust `$THREADS` to suite your hardware: #! perl -sw use strict; use threads; use threads::shared; use Thread::Queue; # grep -h %search_string% .txt > retuned_Results.sl our $THREADS //= 4; my $semSTDOUT :shared; my $Q = new Thread::Queue; my @threads = map async { my $tid = threads->tid; while( my $file = $Q->dequeue ) { open my $fh, '<', $file or warn ( "$file:$!" ) and next; while( <$fh> ) { next unless m[$ARGV[0]]; lock $semSTDOUT; print; } close $fh; } }, 1 .. $THREADS; $Q->enqueue( glob $ARGV[ 1 ] ); $Q->enqueue( (undef) x $THREADS ); $_->join for @threads; __END__ c:\test>junk23 -THREADS=4 "Queue" .pl >junk.dat c:\test>head junk.dat use Thread::Queue; my $input_queue = Thread::Queue->new(); my $result_queue = Thread::Queue->new(); use Thread::Queue; my $Q = Thread::Queue->new; use Thread::Queue; my $Q = new Thread::Queue; use Thread::Queue; my $Q = new Thread::Queue; use Thread::Queue; [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]
Re: Search through multiple files and output results to new file by roboticus (Chancellor) on Aug 24, 2010 at 17:53 UTC
Moloch: I applaud your intent of getting the most out of your machine, but usually file searches are I/O bound, rather than CPU bound. So I'd bet that spreading your I/O among multiple drives would be the best way to improve throughput. Such as: put your output file on one drive, and all your input files split among several other drives. CPUs are so much faster than disk drives that usually the jobs are waiting for a read or write to complete rather than any actual searching. ...roboticus	[reply]
Re: Search through multiple files and output results to new file by TomDLux (Vicar) on Aug 24, 2010 at 15:39 UTC
I got a Perl 'grep' using: `perl -n -e'print if /pod2usage/' ` [download] '-n' tells perl to process all the files on the command line, line by line, but not print them. '-p' does the same them but prints them ... useful for altering data, like you might with sed. '-e' specifies an expression to evaluate on each line. It prints the value of '$_' if '$_' matches the regex, /pod2usage/, i.e., if that string appears anywhere in the line. I'm running on Linux, command line adjustments for Windows are left as an exercise for the student :-) As Occam said: Entia non sunt multiplicanda praeter necessitatem.*	[reply] [d/l]
Re: Search through multiple files and output results to new file by JavaFan (Canon) on Aug 24, 2010 at 16:09 UTC
Unfortunately, the 32 bit grep for windows that I am running, I cannot run multiple instances of it or use the full potential. I don't much about Windows, but I'm baffled by this statement. Why can't you run multiple instances of it? Therefore I am trying to create a Perl script using Perl x64 can do the following grep: `grep -h %search_string% .txt > retuned_Results.sl` [download] Uhm, that's just one instance. Which, AFAIK, on Unix, doesn't use multiple cores either. And why doesn't it do what you want under Windows? Of course, the Perl equivalent of the above is: `perl -ne 'print if /%search_string%/' .txt > retuned_Results.sl` [download] but if the grep doesn't do what you want, I fear the above doesn't do what you want either.	[reply] [d/l] [select]
Re^2: Search through multiple files and output results to new file by BrowserUk (Patriarch) on Aug 24, 2010 at 16:25 UTC
Why can't you run multiple instances of it? Two reasons: Each instance would be trying to process the same files. Which would mean lots of "file in use" errors and/or duplicated results. The multiple instances would be trying to redirect their output to the same file. Even if you use append (`>>`), Windows won't let you do that. Not sure if *nix will? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^3: Search through multiple files and output results to new file by JavaFan (Canon) on Aug 24, 2010 at 20:00 UTC
Each instance would be trying to process the same files. Oh. I never got the impression that the OP wanted this. I fail to see why that's a benefit. Which would mean lots of "file in use" errors and/or duplicated results. Windows doesn't allow two processes to open the same file? That sounds like an easy DoS. The multiple instances would be trying to redirect their output to the same file. Even if you use append (`>>`), Windows won't let you do that. Not sure if nix will?* Unix allows multiple processes to write to the same file. And if the file is opened in append mode, all write()s will go to the end of the file.	[reply] [d/l]
Re^4: Search through multiple files and output results to new file by BrowserUk (Patriarch) on Aug 24, 2010 at 21:05 UTC
Re^5: Search through multiple files and output results to new file by JavaFan (Canon) on Aug 24, 2010 at 21:16 UTC
Some notes below your chosen depth have not been shown here
Re: Search through multiple files and output results to new file by Anonymous Monk on Aug 24, 2010 at 15:34 UTC
ack/App::Ack	[reply]
Re^2: Search through multiple files and output results to new file by planetscape (Chancellor) on Aug 25, 2010 at 03:17 UTC
Also: grep and grep alternatives HTH, planetscape	[reply]