in reply to Re: counting words in string
in thread counting words in string
Since I'm a not an expert with perl regex, I start digging in the code of haukex with commenting it's original code and with searching a simpler loop.
Here we go:
I have 3 questionsuse warnings; use strict; use Test::More tests=>2; my $str = "iowq john stepy andy anne alic bert stepy anne bert andy st +ep alic andy"; my %names; =for comment pos Returns the offset of where the last m//g search left off for the vari +able in question ($_ is used when the variable is not specified). Note that 0 is a valid match offset. undef indicates that the search position is reset (usually due to matc +h failure, but can also be because no match has yet been run on the s +calar). =cut pos($str)=undef; =for comment https://www.regular-expressions.info/continue.html The position where the last match ended is a "magical" value that is r +emembered separately for each string variable. The position is not associated with any regular expression. This means that you can use \G to make a regex continue in a subject s +tring where another regex left off. If a match attempt fails, the stored position for \G is reset to the s +tart of the string. To avoid this, specify the continuation modifier +/c. =cut while ($str=~/\G #start where the last match ended \s* #match 0 to n space char (\S+) #remember any non space char after that and followed by (?: #start clustering of \s+|\z #1 to n spaces or the end of the string ) #end clustering /gcx) { $names{$1}++; } die "failed to parse \$str" unless pos($str)==length($str); test_it (\%names); %names = (); #Takes a new variable #my $str2 = "iowq john stepy andy anne alic bert stepy anne bert andy +step alic andy"; #or reset pos for the original var pos($str)=undef; my $last; while ($str=~/(\w+)/g) { #print $1, " ", pos $str, "\n"; $names{$1}++; $last = pos $str; } die "failed to parse \$str" unless $last ==length($str); test_it(\%names); sub test_it { my $hr_names = shift; is_deeply $hr_names, { alic => 2, andy => 3, anne => 2, bert => 2, iowq => 1, john => 1, step => 1, stepy => 2 }; }
Cheers
François
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: counting words in string
by haukex (Archbishop) on Aug 10, 2018 at 21:44 UTC |