Re: Can you write a faster code to perform this task?
by BrowserUk (Patriarch) on Sep 28, 2014 at 23:06 UTC
|
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
my $start = time;
my $count=0;
while( <DATA> ) {
while( /(M+)/g ) {
$count++;
}
}
print $count."\n";
printf "Took %9f secs\n", time() - $start;
__DATA__
iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo
iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo
... 79998 more lines as above ...
Resulted in this output: C:\test>junk
240000
Took 0.199589 secs
0.2 of a second to count the quarter million occurrences of your search term in 80,000 lines doesn't seem excessive to me. How much time are you looking to save?
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
Re: Can you write a faster code to perform this task?
by AnomalousMonk (Archbishop) on Sep 28, 2014 at 23:03 UTC
|
You'll have to Benchmark it yourself to see if it's any faster, but maybe:
c:\@Work\Perl\monks>perl -wMstrict -le
"$_ = 'iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo';
my $count =()= $_ =~ m{ M+ }xmsg;
print $count;
"
3
| [reply] [d/l] |
Re: Can you write a faster code to perform this task?
by davido (Cardinal) on Sep 29, 2014 at 01:48 UTC
|
How complex does your pattern need to become? In the case of a pattern that consists of (M+), it's unnecessary to rev up the regexp engine.
General solutions work pretty well for the general case. Optimizations work by knowing more about the specific case, possibly allowing you to design the code with shortcuts, bypassing work that the general case might need, but that are unnecessary in the specific case. The more we know about the specifics, the better we can do at figuring out what shortcuts can be applied here.
| [reply] [d/l] |
Re: Can you write a faster code to perform this task?
by Anonymous Monk on Sep 28, 2014 at 23:14 UTC
|
use Benchmark qw(:all);
my $str = 'iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo'
+;
my $these = {
'count' => sub {
my $count = 0;
$count++ while $str =~ /(M+)/g;
$count;
},
'regex' => sub {
my $count = () = $str =~ /(M+)/g;
},
'split' => sub {
$#{[split /M+/, $str, -1]};
},
};
printf "%-10s %d\n", $_, $these->{$_}->() for sort keys %$these;
print "\n";
cmpthese(-1, $these);
Got these:
count 3
regex 3
split 3
Rate split regex count
split 335814/s -- -3% -30%
regex 345739/s 3% -- -27%
count 476625/s 42% 38% --
| [reply] [d/l] [select] |
Re: Can you write a faster code to perform this task?
by Anonymous Monk on Sep 28, 2014 at 23:09 UTC
|
I just generated a file like the line you showed containing 80000 80-char lines; your code processed it in 0.3s. Even when I increased the line length to 1000 and the number of lines to 100k, the execution time stayed under a second. What kind of a speed gain are you looking for? Is the input and code you showed perhaps a simplified version of what you're actually doing? These kind of details do matter in optimizations.
| [reply] |
Re: Can you write a faster code to perform this task?
by Laurent_R (Canon) on Sep 29, 2014 at 06:13 UTC
|
I can't benchmark right now, but I would think that the tr/// operator is likely to be faster than a regex. Having said that, 80,000 lines is not a very large input, I am not sure that there is really a need to optimize further what you have and you're going, at best, to shave off a split second.
| [reply] [d/l] |
|
|
tr counts single letters while the OP wants to count sequences of identical letters.
Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
| [reply] |
|
|
What about something like this?
use 5.014;
say 'iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo'
=~ tr/M/~/sr=~tr/~//; # => prints: 3
Update: just to be sure, it would be safer to replace 'M' with something much exotic, like '\0', which should be OK, assuming that you're reading text files. (tr/M/\0/sr =~ tr/\0//) | [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
True, Rolf, I mis-read the OP's requirement. I thought he wanted the number of "M". But, OTOH, trizen has found a way to use tr/// which seems to be very fast.
| [reply] [d/l] |
Re: Can you write a faster code to perform this task?
by karlgoethebier (Abbot) on Sep 29, 2014 at 13:32 UTC
|
Not the fastest but nice: scalar map { /(M+)/g } ($str);
Compared to the anon monk's benchmark below:
count 3
map 3
regex 3
split 3
Rate split regex map count
split 207677/s -- -7% -28% -47%
regex 223050/s 7% -- -23% -43%
map 288533/s 39% 29% -- -27%
count 394162/s 90% 77% 37% --
Regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] [d/l] [select] |