http://qs1969.pair.com?node_id=11139917

perl_boy has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to print the content of the array that contains the indexes/positions before and after
{LAST_MATCH_START/@- LAST_MATCH_END/@+} where the /\t+/ regex matched within the read input string $_
for example the following input file
foo bar baz
foo bar foo	baz ccc ddd
foo bar foo	baz ccc ddd	foo	bar	baz
foo bar foo		baz ccc ddd			foo
should print
-1 -1
11 12
11 12 23 24 27 28 31 32
11 13 24 27
that is 0 based positions where TABs \t occur in the input file each TAB having a begin end position

NOTE:
    line 1 print "-1" as there are no TABs \t
    line 2 prints "11 12" TABs \t as this is the 0 base index/position of the start/end of the only TAB \t on that line
    line 3 prints "11 12 23 24 27 28 31 32" as this is the 0 base indexes/positions of the start/end of the 4 TABs \t on that line
    line 3 have one adjecsent TABs,that is,there is only 1 TAB between words foo and baz,ddd and foo,foo and bar,bar and baz
    line 4 prints "11 13 24 27" as this is the 0 base indexes/positions of the start/end of the 4 TABs \t on that line
    line 4 have more than 1 {>1 >=2} one adjecsent TABs,that is,there is more than 1 {>1 >=2} one TABs between words foo and baz,ddd and foo,foo and bar,bar and baz
but prints when I uncomment line 4 /\t+/g;
11
12
11
12
11
13
NOTE:
    2 empty lines (1 and 2) begin-of-file it only print that first start/end TAB index/position the start ($-*) and end ($+*) are printed on 2 lines
and when I uncomment line 5 s/\t+/9/g;

11 12 31 32 37 42

NOTE:
    2 empty lines (1 and 2) begin-of-file it only print that last start/end TAB index/position the start ($-*) and end ($+*) are printed on 2 lines
I'm trying to print the content of the LAST_MATCH_START/@- and LAST_MATCH_END/@+ array that contains the indexes/positions
where the regex matched within the read input string $_ using $-[] and $+[] (lines 6 and 7)
here is the code
open(F0, $ARGV[0]); while(<F0>) { # /\t+/g; # s/\t+/9/g; print $-[0], " ",$-[1], " ", $-[2], "\n" ; print $+[0], " ",$+[1], " ", $+[2], "\n" ; } close F0;
NOTE:
    I used g modifier at the end of the regexes (lines 4 and 5)
this is really strange as the /\t+/g; prints the first and s/\t+/9/g; prints the last
I then searched google "perl regex LAST_MATCH_START/@- LAST_MATCH_END/@+ bug" and found
https://github.com/Perl/perl5/issues/16109 that says its not a bug
It would be very helpful if someone could explain what's going on and how to print the right indexes/positions where TAB \t occurs
thank's

Replies are listed 'Best First'.
Re: printing LAST_MATCH_START/@- LAST_MATCH_END/@+ array where regex match begin/end
by choroba (Cardinal) on Dec 26, 2021 at 21:26 UTC
    Move the match with /g into a condition of a while loop.

    As there are no capture groups in the regex (i.e. no parentheses), using any other index than 0 makes no sense. Under warnings, you'd even get lots of Use of uninitialized value in print.

    Moreover, when there's no match, the @- and @+ arrays are empty, there's no -1 (that's what index does). You have to handle that case yourself.

    This should work directly as shown and output the expected output:

    #!/usr/bin/perl use strict; use warnings; my $string = << "__EOS__"; foo bar baz foo bar foo\tbaz ccc ddd foo bar foo\tbaz ccc ddd\tfoo\tbar\tbaz foo bar foo\t\tbaz ccc ddd\t\t\tfoo __EOS__ open my $F0, '<', \$string or die $!; while (<$F0>) { my $match; while (/\t+/g) { $match = 1; print $-[0], ' ', $+[0], ' '; } print '-1 -1' unless $match; print "\n"; }

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      thank's for the tip. I have replaced the code inside the while loop with code you provided
      open(F0, $ARGV[0]); while(<F0>) { my $match; while (/\t+/g) { $match = 1; print $-[0], ' ', $+[0], ' '; } print '-1 -1' unless $match; print "\n"; } close F0;
      and works fine and get the expected results. good thank's :)))

      However I find that that PerlDoc https://perldoc.perl.org/perlvar#Variables-related-to-regular-expressions
      is a bit misleading when it says that indexes/positions before and after regex match can be found in the LAST_MATCH_START/@- and LAST_MATCH_END/@+ array by printing
      print $+[0], " ",$+[1], " ", $+[2], "\n" ;

      again thank's for the code :)))
Re: printing LAST_MATCH_START/@- LAST_MATCH_END/@+ array where regex match begin/end
by tybalt89 (Monsignor) on Dec 26, 2021 at 23:10 UTC
    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11139917 use warnings; while( <DATA> ) { my @pos; push @pos, @-, @+ while /\t+/g; @pos or @pos = (-1) x 2; print "@pos\n"; } __DATA__ foo bar baz foo bar foo baz ccc ddd foo bar foo baz ccc ddd foo bar baz foo bar foo baz ccc ddd foo
      thank's for the tip. and works fine and get the expected results. good thank's :)))

      However I find that that PerlDoc https://perldoc.perl.org/perlvar#Variables-related-to-regular-expressions
      is a bit misleading when it says that indexes/positions before and after regex match can be found in the LAST_MATCH_START/@- and LAST_MATCH_END/@+ array by printing
      print $+[0], " ",$+[1], " ", $+[2], "\n" ;

      again thank's for the tip :)))

        Makes sense to me...

        'abcdef' =~ /b(.)(.)./ and print "end of entire match ", $+[0], " en +d of first group ",$+[1], " end of second group ", $+[2], "\n";

        outputs:

        end of entire match 5 end of first group 3 end of second group 4

        This is exactly as expected.