nachumk has asked for the wisdom of the Perl Monks concerning the following question:

How do I identify that the string has been matched $ already? Note this example:
my $str = "abc"; while ($str =~ m/.|$/gc) { printf("1: %d\n", pos($str)); } while ($str =~ m/.|$/gc) { printf("2: %d\n", pos($str)); } Output: 1: 1 1: 2 1: 3 1: 3
The second loop matches nothing - I want this, but more importantly, I'd like something that can tell me that the regex engine has matched $ already. pos is set to 3 for each of those matches, so what other function can tell me that a string has already matched $? I need to use gc as I don't want a non-match to reset pos().

Replies are listed 'Best First'.
Re:perl indication of end of string already matched
by AnomalousMonk (Archbishop) on Jun 08, 2020 at 15:35 UTC

    This sounds very much like an XY Problem. Can you give us a Short, Self-Contained, Correct Example to illustrate your immediate problem? In any event, is the following something like what you would want?

    c:\@Work\Perl\monks>perl -wMstrict -le "my $str = 'abc'; while ($str =~ m/./gc) { printf qq{1: %d \n}, pos $str; } ;; printf qq{2: %d \n}, pos $str; print 'pos at end' if pos $str == length $str; " 1: 1 1: 2 1: 3 2: 3 pos at end


    Give a man a fish:  <%-{-{-{-<

      print 'pos at end' if pos $str == length $str;

      This if statement is true whether or not the previous regex matched $. And the regex engine will only match $ once. How does the regex engine know that it matched $? And can I get access to that information? I prefer to not call the regex engine again.

      Using pos == length is sufficient. I was hoping there was a simpler call, something like pos() but for identifying whether $ was already matched. That would allow me to avoid two calls (length and pos) and instead call one function (perhaps eos($str)). I'm very sensitive to performance during parsing.

        print 'pos at end' if pos $str == length $str;

        This if statement is true whether or not the previous regex matched $.

        I don't understand this. Can you give an example of a non-lookahead regex that matches to the end of a string and does not match at the end of the string, i.e. does not leave pos sitting beyond the end of the string (or pos == length)?

        Using pos == length is sufficient. ... a simpler call ... avoid two calls ... and instead call one function ... I'm very sensitive to performance during parsing.

        It sounds as if you may have an answer (even though I'm still a bit confused about the question). I imagine that Inline::C would allow you to define a single function to examine the internals of a string scalar and return info on pos versus length. Good luck :)


        Give a man a fish:  <%-{-{-{-<

        >>How does the regex engine know that it matched $?

        Hi.

        I think engine doesn't know. It knows e.g. that it matched a zero-length branch once. And it cancels to match second time the same place in order to avoid eternal matching.
        I believe you can get similar results with regexes like these: m/.|(?:)/gc, m/.|(?=)|(?:)/gc...
Re: perl indication of end of string already matched
by Tux (Canon) on Jun 08, 2020 at 15:41 UTC

    Maybe I am simplistic, but this works:

    my $str = "abc"; while ($str =~ m/.(?=.|$)/gc) { say "1: ", pos $str; } while ($str =~ m/.(?=.|$)/gc) { say "2: ", pos $str; }

    Enjoy, Have FUN! H.Merijn
Re: perl indication of end of string already matched
by Marshall (Canon) on Jun 08, 2020 at 21:41 UTC
    I am confused also. This appears to be a very contrived example. Can you show some code closer to your actual application?

    For your regex, just keeping track of last vs curr pos would seem to do it. $ matches when pos does not advance.

    use warnings; use strict; my $str = "abc"; my $last_pos =0; while ($str =~ m/.|$/gc) { my $curr_pos = pos($str); printf("1: %d", $curr_pos); ($curr_pos == $last_pos) ? print " EOString\n" : print "\n"; $last_pos = $curr_pos; } __END__ 1: 1 1: 2 1: 3 1: 3 EOString
    For character by character processing of a string, substr() and it's buddies are appropriate, not regex. An example showing more of what you are really trying to accomplish would be helpful.
Re: perl indication of end of string already matched
by perlfan (Parson) on Jun 09, 2020 at 03:27 UTC
    Smells like this section of perlretut.
Re: perl indication of end of string already matched
by rsFalse (Chaplain) on Jun 16, 2020 at 11:38 UTC
    I believe you can envelope a piece of regex of interest inside parentheses, e.g. "m/.|($ )/gcx", and ask if $1 is defined.
    Also I believe this can help: m/.|$ (?{ $matched = 1; })/gcx