Re: some forking help

Replies are listed 'Best First'.
Re: Re: some forking help by mstone (Deacon) on Dec 24, 2001 at 23:21 UTC
hmm.. I smell a chance to test my understanding. Creating and storing qr// expressions takes extra work, but beats this simpler form: `my %hash_one = { 'string_one' => 0, 'string_two' => 0, 'string_three' => 0, }; @ARGV = qw(file.txt); close ARGV; while (<>) { for $key (keys %hash_one) { $hash_one{ $key }++ if (/$key/); } }` [download] because qr// lets perl precompile the regexp. That would pay off in cases like this, where we'll be looping through the same set of regexps over and over again, yes?	[reply] [d/l]
Re(2): some forking help by dmmiller2k (Chaplain) on Dec 24, 2001 at 23:50 UTC
With a 100Mb file and 50+ strings to search for, there could be some speed advantage to forking separate processes for each search string and letting them run in parallel. Especially, if the regexen are precompiled before forking. Of course, the sheer simplicity of merlyn's solution probably more than compensates for the overall savings in time through the use of parallelism, when you realize that the tricky task of gathering up the individual counts from each of the child processes is not as straightforward as it may at first glance appear. dmm You can give a man a fish and feed him for a day ... Or, you can teach him to fish and feed him for a lifetime	[reply]
Re: Re(2): some forking help by fokat (Deacon) on Dec 25, 2001 at 00:32 UTC
The only way in which a `fork()`ing solution would be faster than the solutions posted so far, would be in a MP machine, where each process could scan the file separatedly. This, assuming that the file fits within the buffer cache. Otherwise, the price of the context switches will make this solution run slower. Just my $0.02 :) Merry Christmas to all the fellow monks!	[reply]
Re(4): some forking help by dmmiller2k (Chaplain) on Dec 25, 2001 at 01:13 UTC
I'm not sure. My gut feeling is that searching a file is fairly I/O bound, and therefore would involve a significant amount of waiting for the disk regardless; why not capitalize on that by waiting in parallel? dmm You can give a man a fish and feed him for a day ... Or, you can teach him to fish and feed him for a lifetime	[reply]
Re: Re: some forking help by blakem (Monsignor) on Jan 15, 2002 at 17:39 UTC
merlyn, I hate to critique code that was written on Christmas Eve, but this looks to have three separate bugs. There are two major issues in the `while(<>)` loop. First, $_ plays a dual role in the inner for loop, with the looping value clobbering the data from the file. Adding an inner loop var (i.e. `for my $key`) will avoid clobbering $_. The second bug involves the `if qr/$hash_one{$_}[0]/` construct. This doesn't seem to be executing the regex, just compiling it (again??) and returning a true value. You can either drop the qr, leaving `/$hash_one{$_}[0]/` or explicitly bind it with `$_ =~ qr/$hash_one{$_}[0]/` or perhaps just `$_ =~ $hash_one{$_}[0]` The third issue is more subtle, but still a bug. You aren't quoting special chars when compiling regexes for literal strings... `qr/$_/` really should be `qr/\Q$_\E/` With those three issues out of the way we have: #!/usr/bin/perl -wT use strict; my %hash_one = ('string_one' => 0, 'string_two' => 0, '[[[string_three' => 0, # test special chars behavio +r 'string_four' => 0, 'string_five' => 0, 'string_six' => 0, 'string_seven' => 0); # first, create an array ref, element 0 is a qr// of the key, and elem +ent 1 is the count: for (keys %hash_one) { $hash_one{$_} = [qr/\Q$_\E/, 0]; } # then walk the data, trying all the regexen: # Replaced with <DATA> - blakem # @ARGV = qw(file.txt); # close ARGV; while (<DATA>) { for my $key (keys %hash_one) { $hash_one{$key}[1]++ if $_ =~ $hash_one{$key}[0]; } } # finally, replace the arrayref with just the count: $_ = $_->[1] for values %hash_one; # works in perl 5.5 and greater print "$_ => $hash_one{$_}\n" for keys %hash_one; __DATA__ 1 string_one string_two 2 string_two [[[string_three [[[string_three 3 [[[string_three string_four string_four string_four 4 string_four doesn'tmatchanything [download] Which works correctly and outputs: `string_four => 4 string_six => 0 string_five => 0 string_one => 1 string_seven => 0 [[[string_three => 3 string_two => 2` [download] Those bugs make me think you coded that whole thing right here in the pm form box w/o running it through any sample data.... in a perverse sort of way, thats more impressive than if it had been totally clean the first time out. ;-) -Blake	[reply] [d/l] [select]
Re: Re: Re: some forking help by merlyn (Sage) on Jan 15, 2002 at 17:51 UTC
Those bugs make me think you coded that whole thing right here in the pm form box w/o running it through any sample data.... in a perverse sort of way, thats more impressive than if it had been totally clean the first time out. ;-) I've often written code for replies here and on Usenet without testing, right in the reply buffer. And I take my licks when I guess wrong. Thank you for debugging my code. -- Randal L. Schwartz, Perl hacker	[reply]