ysreenu has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
This is my first question on Perl Monks.
I tried to follow all guidelines but if you spot any deviations please correct me.

I was trying a sample code just to understand the basics of match operator(=~)
but could not find out the reason for unepected output in different parts of the program given below.
#!/usr/bin/perl -wl use strict; use warnings; =head This is perl 5, version 14, subversion 2 (v5.14.2) built for i686-linu +x-gnu-thread-multi-64int (with 57 registered patches, see perl -V for more detail) =cut my $string = "blink mink chink sink wink dink nn kkkk"; print "Phase-1:"; my @cnt_arr = $string =~ /ink/g; print "count array = @cnt_arr"; print "substring is found ", scalar(@cnt_arr), " times"; #prints 6 print "\nPhase-2:"; #attempts to do the same without the use of @cnt_arr print "substring is found ", scalar($string =~ /ink/g), " times"; #prints only 1 print "after 1, string = $string"; print "substring is found ", scalar(($string =~ /ink/g)), " times"; #an attempt to convert to array context, but still prints 1 print "after 2, string = $string"; print "\nPhase-3:"; @cnt_arr = $string =~ /ink/g; print "count array = @cnt_arr"; print "substring is found ", scalar(@cnt_arr), " times"; #now prints 4 !! print "after 3, string = $string"; # $string content remains the same throughout but match operator #is giving different results in phase-1 and phase-3

Phase-1:
count array = ink ink ink ink ink ink
substring is found 6 times

Phase-2:
substring is found 1 times
after 1, string = blink mink chink sink wink dink nn kkkk
substring is found 1 times
after 2, string = blink mink chink sink wink dink nn kkkk

Phase-3:
count array = ink ink ink ink
substring is found 4 times
after 3, string = blink mink chink sink wink dink nn kkkk

The output of match operator in phase-3 is showing only 4 substring matches even though the content of variable $string remains the same.

Can you please explain why this is happening and how to overcome it so that the number of substring matches is consistent in phase-1 and phase-3.

Replies are listed 'Best First'.
Re: Match operator giving unexpected output
by Athanasius (Archbishop) on Jan 13, 2015 at 06:18 UTC

    Hello ysreenu, and welcome to the Monastery!

    First, when scalar is applied to an array, it returns the number of elements in the array. But when scalar is applied to a list, it returns the last element in that list:

    16:12 >perl -wE "say scalar('a', 'e', 'i', 'o', 'u');" Useless use of a constant ("a") in void context at -e line 1. Useless use of a constant ("e") in void context at -e line 1. Useless use of a constant ("i") in void context at -e line 1. Useless use of a constant ("o") in void context at -e line 1. u 16:13 >

    Second, calling m/.../g in scalar context finds one match only, and advances the pointer to the next position in the string following the last match. So your two calls in Phase-2 eat up the first two occurrences of “ink” in the string. Then in Phase-3, since the pointer has not been reset, the call to m/.../g in list context finds only the remaining matches, which is why you get 4 and not 6.

    Update: See “Global matching” in Using regular expressions in Perl.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      >  when scalar is applied to a list,

      Here "list" is to be read as comma separated (literal) list. (see scalar comma separator)

      Sorry for nitpicking but there is too much confusion. ..

      Cheers Rolf

      PS: Je suis Charlie!

        LanX> when scalar is applied to a list,

        Here "list" is to be read as comma separated (literal) list. (see scalar comma separator)

        Sorry for nitpicking but there is too much confusion. .

        What confusion where (where is the confusion)? And how is this distinction important? In this case?

        I think its unimportant , esp in this case, a distinction without a difference ... list literal or literal list is not even in perlglossary

        A different demo of the "null list"

        ## a literal list in list context $ perl -le " print( qw/ a b c d / )" abcd ## a literal list in scalar context $ perl -le " print( scalar qw/ a b c d / )" d ## a literal list in lvalue list context ## a literal list in list context ( the left hand side list also hap +pens to be an lvalue) ## a literal list in list assignment (list context ) $ perl -le " print( ( $q ) = qw/ a b c d / )" a ## null list in scalar context counts, counting a list literal $ perl -le " print( $q = () = qw/ a b c d / )" 4 ## null list in scalar context counts, counting a "list" $ perl -le " @f = qw/ a b c d /; print( $q = () = @f )" 4 ## null list in scalar context counts $ perl -le " print( scalar( () = qw/ a b c d / ) )" 4 ## null list in list context discards (using literal list) $ perl -le " print( () = qw/ a b c d / )" ## null list in list context discards (the array kind of list ) $ perl -le " @f = qw/ a b c d /; print( () = @f )"
      Hi Athanasius,
      Thanks for the clarification.
      My doubt arises from this statement from phase-3 of the quoted program

      @cnt_arr = $string =~ /ink/g;

      I though the content of left operand ($string) is not changed, so on which pointer =~ operator is working on?

      I got the answer for resetting pointer from the link suggested by you.
      pos($string)=0; just before phase-3 would do the job.
      On another note, this behavior looks to C strtok() standard library function which updates an internal pointer that is
      not exposed but called should keep track of it!
Re: Match operator giving unexpected output (source)
by Anonymous Monk on Jan 13, 2015 at 08:27 UTC
      Hi anonymous
      Thanks for asking.
      My expectation is naive in the sense that when variable value is unchanged why is the
      match operator not matching all occurrences of substring from the beginning of string.
      But as per the link provided by Athanasius, pos() function would either return the current matching offset or
      can be set the offset to influence the next matching operation.
      Because of this discussion, I got to learn a new function.