counting regex hits

Becky has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: counting regex hits by davis (Vicar) on Mar 18, 2003 at 11:01 UTC
`shell$ perldoc perlfaq4` is pretty useful for this sort of stuff (abuse of code tags follows): `How can I count the number of occurrences of a substring within a stri +ng? There are a number of ways, with varying efficiency. If you want a count of a certain single character (X) within a string, you can use the "tr///" function like so: $string = "ThisXlineXhasXsomeXx'sXinXit"; $count = ($string =~ tr/X//); print "There are $count X characters in the string";` [download] That should get you started davis Is this going out live? No, Homer, very few cartoons are broadcast live - it's a terrible strain on the animator's wrist	[reply] [d/l]
Re: counting regex hits by broquaint (Abbot) on Mar 18, 2003 at 11:02 UTC
Your best bet is `tr` e.g `my $str = "ARKL---MNRD--SET"; print "$str [- count]: ", $str =~ tr/-//, $/; __output__ ARKL---MNRD--SET [- count]: 5` [download] HTH `_________ broquaint`	[reply] [d/l]
Re: counting regex hits by robartes (Priest) on Mar 18, 2003 at 11:03 UTC
In list context, a regexp will return the matches. You can use that fact to count the number of matches: `my $string="ARKL---MNRD--SET"; my $count = () = $string =~/-/g; print $count; __END__ 5` [download] CU Robartes-	[reply] [d/l]
Re: counting regex hits Benchamark by Tomte (Priest) on Mar 18, 2003 at 12:16 UTC
I know that it is well-known that `tr` is faster than `m//`, nevertheless I thought I write my first Benchmark.pm-using, aehmm, well, benchmark :-) Read more... (989 Bytes) I'd be glad to hear if I did something considerably stupid here :-), under this proviso the (well known!) results: `__END__ Benchmark: timing 1000000 iterations of m//, tr... m//: 57 wallclock secs (54.53 usr + 0.07 sys = 54.60 CPU) @ 18 +315.02/s (n=1000000) tr: 1 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 52 +0833.33/s (n=1000000) Rate m// tr m// 18315/s -- -96% tr 520833/s 2744% --` [download] Quite impressive difference, isn't it? regards, tomte	[reply] [d/l] [select]
Re: Re: counting regex hits Benchamark by cees (Curate) on Mar 18, 2003 at 15:35 UTC
The only thing that I would change is the 'random feature' you put in the benchmark. This will throw off your results, since you are not providing the same value to each function. In the worst case the 'tr' function could get the shortest string and the 'm//' function gets the longest string every time. Of course this ambiguity would be evened out since you do a million iterations, and I'll bet your results will not change much by fixing this. But, I would replace the random function and just loop through each item in every iteration. `cmpthese(100000, { 'tr' => sub { foreach (@rands) { my $x = tr/-//; } }, 'm//' => sub { foreach (@rands) { my $x = () = $_ =~ /-/g; } }, } );` [download] This way you are guaranteed an even distribution of your sample data. I agree with you that the results are impressive. Definately something to keep in the bin of useful perl knowledge...	[reply] [d/l]
Re: Re: Re: counting regex hits Benchamark by Doc Technical (Initiate) on Mar 18, 2003 at 22:48 UTC
No regex required: `$strg = qq[This has- some- d-a-she-s -in it]; %cnt = (); @chrs = split('',$strg); foreach (@chrs) { $cnt{$_}++; } print qq[$cnt{'-'}\n];` [download]	[reply] [d/l]
Re*4: counting regex hits Benchamark by Tomte (Priest) on Mar 19, 2003 at 09:17 UTC
Re: Re*4: counting regex hits Benchamark by Doc Technical (Initiate) on Mar 19, 2003 at 20:05 UTC
Re^2: counting regex hits Benchamark (except) by tye (Sage) on Mar 19, 2003 at 00:13 UTC
I know that it is well-known that tr is faster than m// Except when it is not. Quite impressive difference, isn't it? No. The difference is 1/18982.5th of second which isn't impressive at all. q-: The quotient looks impressive but benchmark has to go to a lot of work to be able to guess at that so you won't see a quotient anything close to that in practice. Congratulations, you've now prematurely optimized this nano-operation. I find that the difference is usually more indicative of the practical value of the optmization than the quotient. - tye (cheap philosophy shouldn't cost a lot)	[reply]
Re: Re^2: counting regex hits Benchamark (except) by Tomte (Priest) on Mar 19, 2003 at 07:38 UTC
I know that it is well-known that tr is faster than m// Except when it is not. Point taken: I amend my original proposition with: if the problem is as simple as the OPs, that is if a simple transliteration is your goal, or a sideeffect of this transliteration, that you may as well use the transliteration operator seems to be a well known fact. :-) Congratulations, you've now prematurely optimized this nano-operation. Do not fall pray to false conclusions: I was interested in the abstract comparison of two nano-operations and in how to use Benchmark.pm myself, I didn't optimize anything, how could I? I don't have code lying around using any of these oprarators to actually count anything. So as this wasn't optimization, how could it be premature?;-p I find that the difference is usually more indicative of the practical value of the optmization than the quotient. this point taken again kind regards, tomte	[reply]