Can you write a faster code to perform this task?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Can you write a faster code to perform this task? by BrowserUk (Patriarch) on Sep 28, 2014 at 23:06 UTC
Running this slightly modified version of your code: `#! perl -slw use strict; use Time::HiRes qw[ time ]; my $start = time; my $count=0; while( <DATA> ) { while( /(M+)/g ) { $count++; } } print $count."\n"; printf "Took %9f secs\n", time() - $start; __DATA__ iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo ... 79998 more lines as above ...` [download] Resulted in this output: `C:\test>junk 240000 Took 0.199589 secs` [download] 0.2 of a second to count the quarter million occurrences of your search term in 80,000 lines doesn't seem excessive to me. How much time are you looking to save? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re: Can you write a faster code to perform this task? by AnomalousMonk (Archbishop) on Sep 28, 2014 at 23:03 UTC
You'll have to Benchmark it yourself to see if it's any faster, but maybe: `c:\@Work\Perl\monks>perl -wMstrict -le "$_ = 'iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo'; my $count =()= $_ =~ m{ M+ }xmsg; print $count; " 3` [download]	[reply] [d/l]
Re: Can you write a faster code to perform this task? by davido (Cardinal) on Sep 29, 2014 at 01:48 UTC
How complex does your pattern need to become? In the case of a pattern that consists of `(M+)`, it's unnecessary to rev up the regexp engine. General solutions work pretty well for the general case. Optimizations work by knowing more about the specific case, possibly allowing you to design the code with shortcuts, bypassing work that the general case might need, but that are unnecessary in the specific case. The more we know about the specifics, the better we can do at figuring out what shortcuts can be applied here. Dave	[reply] [d/l]
Re: Can you write a faster code to perform this task? by Anonymous Monk on Sep 28, 2014 at 23:14 UTC
Benchmarked a few solutions: `use Benchmark qw(:all); my $str = 'iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo' +; my $these = { 'count' => sub { my $count = 0; $count++ while $str =~ /(M+)/g; $count; }, 'regex' => sub { my $count = () = $str =~ /(M+)/g; }, 'split' => sub { $#{[split /M+/, $str, -1]}; }, }; printf "%-10s %d\n", $_, $these->{$_}->() for sort keys %$these; print "\n"; cmpthese(-1, $these);` [download] Got these: `count 3 regex 3 split 3 Rate split regex count split 335814/s -- -3% -30% regex 345739/s 3% -- -27% count 476625/s 42% 38% --` [download]	[reply] [d/l] [select]
Re: Can you write a faster code to perform this task? by Anonymous Monk on Sep 28, 2014 at 23:09 UTC
I just generated a file like the line you showed containing 80000 80-char lines; your code processed it in 0.3s. Even when I increased the line length to 1000 and the number of lines to 100k, the execution time stayed under a second. What kind of a speed gain are you looking for? Is the input and code you showed perhaps a simplified version of what you're actually doing? These kind of details do matter in optimizations.	[reply]
Re: Can you write a faster code to perform this task? by Laurent_R (Canon) on Sep 29, 2014 at 06:13 UTC
I can't benchmark right now, but I would think that the `tr///` operator is likely to be faster than a regex. Having said that, 80,000 lines is not a very large input, I am not sure that there is really a need to optimize further what you have and you're going, at best, to shave off a split second.	[reply] [d/l]
Re^2: Can you write a faster code to perform this task? by LanX (Saint) on Sep 29, 2014 at 08:17 UTC
`tr` counts single letters while the OP wants to count sequences of identical letters. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :)}	[reply]
Re^3: Can you write a faster code to perform this task? by trizen (Hermit) on Sep 29, 2014 at 11:50 UTC
What about something like this? `use 5.014; say 'iiiiiiiiMMMMMMMMMMMooooooooooooMMMMMMMMMMiiiiiMMMMMMMMoooo' =~ tr/M/~/sr=~tr/~//; # => prints: 3` [download] Update: just to be sure, it would be safer to replace 'M' with something much exotic, like '\0', which should be OK, assuming that you're reading text files. (`tr/M/\0/sr =~ tr/\0//`)	[reply] [d/l] [select]
Re^4: Can you write a faster code to perform this task? by Lotus1 (Vicar) on Sep 29, 2014 at 18:30 UTC
Re^5: Can you write a faster code to perform this task? by BrowserUk (Patriarch) on Sep 29, 2014 at 19:38 UTC
Re^4: Can you write a faster code to perform this task? by LanX (Saint) on Sep 29, 2014 at 12:07 UTC
Re^5: Can you write a faster code to perform this task? by Your Mother (Archbishop) on Sep 29, 2014 at 16:02 UTC
Some notes below your chosen depth have not been shown here
Re^3: Can you write a faster code to perform this task? by Laurent_R (Canon) on Sep 29, 2014 at 17:25 UTC
True, Rolf, I mis-read the OP's requirement. I thought he wanted the number of "M". But, OTOH, trizen has found a way to use `tr///` which seems to be very fast.	[reply] [d/l]
Re: Can you write a faster code to perform this task? by karlgoethebier (Abbot) on Sep 29, 2014 at 13:32 UTC
Not the fastest but nice: `scalar map { /(M+)/g } ($str);` Compared to the anon monk's benchmark below: `count 3 map 3 regex 3 split 3 Rate split regex map count split 207677/s -- -7% -28% -47% regex 223050/s 7% -- -23% -43% map 288533/s 39% 29% -- -27% count 394162/s 90% 77% 37% --` [download] Regards, Karl �The Crux of the Biscuit is the Apostrophe�	[reply] [d/l] [select]