Re: counting regex hits
by davis (Vicar) on Mar 18, 2003 at 11:01 UTC
|
How can I count the number of occurrences of a substring within a stri
+ng?
There are a number of ways, with varying efficiency. If you want a
count of a certain single character (X) within a string, you can use
the "tr///" function like so:
$string = "ThisXlineXhasXsomeXx'sXinXit";
$count = ($string =~ tr/X//);
print "There are $count X characters in the string";
That should get you started
davis
Is this going out live?
No, Homer, very few cartoons are broadcast live - it's a terrible strain on the animator's wrist
| [reply] [d/l] |
Re: counting regex hits
by broquaint (Abbot) on Mar 18, 2003 at 11:02 UTC
|
my $str = "ARKL---MNRD--SET";
print "$str [- count]: ", $str =~ tr/-//, $/;
__output__
ARKL---MNRD--SET [- count]: 5
HTH
_________ broquaint | [reply] [d/l] |
Re: counting regex hits
by robartes (Priest) on Mar 18, 2003 at 11:03 UTC
|
In list context, a regexp will return the matches. You can use that fact to count the number of matches:
my $string="ARKL---MNRD--SET";
my $count = () = $string =~/-/g;
print $count;
__END__
5
CU Robartes- | [reply] [d/l] |
Re: counting regex hits Benchamark
by Tomte (Priest) on Mar 18, 2003 at 12:16 UTC
|
I know that it is well-known that tr is faster than m//, nevertheless
I thought I write my first Benchmark.pm-using, aehmm, well, benchmark :-)
I'd be glad to hear if I did something considerably stupid here :-), under this proviso the (well known!) results:
__END__
Benchmark: timing 1000000 iterations of m//, tr...
m//: 57 wallclock secs (54.53 usr + 0.07 sys = 54.60 CPU) @ 18
+315.02/s (n=1000000)
tr: 1 wallclock secs ( 1.92 usr + 0.00 sys = 1.92 CPU) @ 52
+0833.33/s (n=1000000)
Rate m// tr
m// 18315/s -- -96%
tr 520833/s 2744% --
Quite impressive difference, isn't it?
regards,
tomte
| [reply] [d/l] [select] |
|
|
The only thing that I would change is the 'random feature' you put in the benchmark. This will throw off your results, since you are not providing the same value to each function. In the worst case the 'tr' function could get the shortest string and the 'm//' function gets the longest string every time.
Of course this ambiguity would be evened out since you do a million iterations, and I'll bet your results will not change much by fixing this. But, I would replace the random function and just loop through each item in every iteration.
cmpthese(100000, {
'tr' => sub {
foreach (@rands) {
my $x = tr/-//;
}
},
'm//' => sub {
foreach (@rands) {
my $x = () = $_ =~ /-/g;
}
},
}
);
This way you are guaranteed an even distribution of your sample data.
I agree with you that the results are impressive. Definately something to keep in the bin of useful perl knowledge...
| [reply] [d/l] |
|
|
$strg = qq[This has- some- d-a-she-s -in it];
%cnt = ();
@chrs = split('',$strg);
foreach (@chrs) {
$cnt{$_}++;
}
print qq[$cnt{'-'}\n];
| [reply] [d/l] |
|
|
|
|
|
|
I know that it is well-known that tr is faster than m//
Except when it is not.
Quite impressive difference, isn't it?
No. The difference is 1/18982.5th of second which isn't impressive at all. q-: The quotient looks impressive but benchmark has to go to a lot of work to be able to guess at that so you won't see a quotient anything close to that in practice.
Congratulations, you've now prematurely optimized this nano-operation.
I find that the difference is usually more indicative of the practical value of the optmization than the quotient.
- tye (cheap philosophy shouldn't cost a lot)
| [reply] |
|
|
I know that it is well-known that tr is faster than m//
Except when it is not.
Point taken: I amend my original proposition with:
if the problem is as simple as the OPs, that is if a simple transliteration is your goal, or a sideeffect of this transliteration, that you may as well use the transliteration operator seems to be a well known fact.
:-)
Congratulations, you've now prematurely optimized this nano-operation.
Do not fall pray to false conclusions:
I was interested in the abstract comparison of two nano-operations and in how to use Benchmark.pm myself, I didn't optimize anything, how could I? I don't have code lying around using any of these oprarators to actually count anything. So as this wasn't optimization, how could it be premature?;-p
I find that the difference is usually more indicative of the practical value of the optmization than the quotient.
this point taken again
kind regards,
tomte
| [reply] |