comment on

Since I'm a not an expert with perl regex, I start digging in the code of haukex with commenting it's original code and with searching a simpler loop.

Here we go:

use warnings;
use strict;
use Test::More tests=>2;

my $str = "iowq john stepy andy anne alic bert stepy anne bert andy st
+ep alic andy";

my %names;
=for comment
pos 
Returns the offset of where the last m//g search left off for the vari
+able in question ($_ is used when the variable is not specified). 
Note that 0 is a valid match offset. 
undef indicates that the search position is reset (usually due to matc
+h failure, but can also be because no match has yet been run on the s
+calar).
=cut

pos($str)=undef;

=for comment
https://www.regular-expressions.info/continue.html
The position where the last match ended is a "magical" value that is r
+emembered separately for each string variable. 
The position is not associated with any regular expression. 
This means that you can use \G to make a regex continue in a subject s
+tring where another regex left off.
If a match attempt fails, the stored position for \G is reset to the s
+tart of the string. To avoid this, specify the continuation modifier 
+/c.
=cut

while ($str=~/\G #start where the last match ended
    \s*  #match 0 to n space char
    (\S+) #remember any non space char after that and followed by
    (?:  #start clustering of
    \s+|\z #1 to n spaces or the end of the string
    ) #end clustering
    /gcx) {
    $names{$1}++;
}
die "failed to parse \$str" unless pos($str)==length($str);

test_it (\%names);

%names = ();
#Takes a new variable
#my $str2 = "iowq john stepy andy anne alic bert stepy anne bert andy 
+step alic andy";
#or reset pos for the original var
pos($str)=undef;
my $last;
while ($str=~/(\w+)/g) {
    #print $1, " ", pos $str, "\n";
    $names{$1}++;
    $last = pos $str;
}

die "failed to parse \$str" unless $last ==length($str);

test_it(\%names);

sub test_it {
    my $hr_names = shift;
    is_deeply $hr_names, { alic => 2, andy => 3, anne => 2, bert => 2,
    iowq => 1, john => 1, step => 1, stepy => 2 };
}
[download]

I have 3 questions

Where is the /c modifier documented in the perldoc ? It is given at the end of pos but I couldn't find no other description
How comes that I have to remember pos $str in the second loop ? It's undef after the second loop but not after the first one
Are the two loops equivalent or will the second one failed in some situation ?

Cheers

François

In reply to Re^2: counting words in string by frazap
in thread counting words in string by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.