http://qs1969.pair.com?node_id=11127038

rsFalse has asked for the wisdom of the Perl Monks concerning the following question:

Hello.

Today I found a strange behaviour in my sandbox program after I tried to soft-code a variable. Here is my code:
#!/usr/bin/perl use warnings; use strict; $\ = $/; my $A = "1112" # . '' ; @_ = ( 'a' .. 'i' ); $A =~ /1+ (??{ print "[$&]"; $_ .= shift @_; s!\B!shift@_!e; print "\$_:[$_]"; }) $ /x;
OUTPUT:
[111] $_:[1b112a] [11] $_:[1db112ac] [1] $_:[1fdb112ace] [11] $_:[1hfdb112aceg] [1] Use of uninitialized value within @_ in substitution iterator at ./str +ange_ARG_inside_regex.pl line 18. $_:[1hfdb112acegi] [1] Use of uninitialized value in concatenation (.) or string at ./strange +_ARG_inside_regex.pl line 17. Use of uninitialized value within @_ in substitution iterator at ./str +ange_ARG_inside_regex.pl line 18. $_:[1hfdb112acegi]
The same code except one line being un-commented:
#!/usr/bin/perl use warnings; use strict; $\ = $/; my $A = "1112" . '' ; @_ = ( 'a' .. 'i' ); $A =~ /1+ (??{ print "[$&]"; $_ .= shift @_; s!\B!shift@_!e; print "\$_:[$_]"; }) $ /x;
OUTPUT (is variable):
[111] $_:[1b112a] [11] $_:[1db112ac] [�] $_:[1fdb112ace]
Output varies, but when I run similar code inside a FOR loop with ~1e4 iterations I got a similar small number of hash values.
Perl v5.30.0
I tried to minimize the code. But I can't find to how minimize it further.

I don't know why this strange behaviour occurs. It looks kinda bug? Ofc, I coded with very bad practice = I was changing my regex on the fly.
Problem with... backtracking? lexical $_ binded to $A?

Replies are listed 'Best First'.
Re: Strange behaviour when regex variable is modified during match?
by dave_the_m (Monsignor) on Jan 17, 2021 at 19:03 UTC
    So I see a bug whereby the 3rd iteration of $& is random garbage when $A has '' appended. Is this the bug you're reporting, or is there any other behaviour in the output you've shown which you also consider a bug?

    Dave.

      Only this one.
Re: Strange behaviour when regex variable is modified during match?
by Anonymous Monk on Jan 17, 2021 at 19:08 UTC

    You have not said what you are trying to do, or what behavior you expect, so all I can do is make some observations:

    • The documentation for (??{ *code* }) begins: "WARNING: Using this feature safely requires that you understand its limitations." For me this would be a red flag strongly suggesting that whatever you are trying to do you should choose a different implementation.
    • The aforesaid documentation says that whatever the code returns is treated as a fragment of your regular expression. Absent an explicit return, your code returns whatever its last statement (a print) returns. This is documented as a true value, which means pretty much anything but undef, 0, or '', and may well vary from execution to execution.
    • Modifying any Perl variable resets its pos(), which is the regexp engine's notion of where it is in the string it is matching. I suspect it would take someone with deep knowledge of the regex engine to say what it will do if you modify pos() (or the variable itself) while it is being matched, and I further suspect that whatever it does may vary from release to release.

    The usual way to modify a string is to use the substitution operator s/regexp/replacement/.

      I was only playing with regex and eval inside. I tried to generate some substitution patterns, e.g. zipping or meshing chars.

      Sincerely, earlier today I found other "bug" (=feature) that $_ inside (?{ ... }) binds to LHS after I tried to manipulate $_ as a global. But I found in perlre that it is desired behaviour. So then I tried to play with this feature.
      Definitely you are playing too close to the underlying implementation of regex and need to find a different way to do what you're trying.
Re: Strange behaviour when regex variable is modified during match?
by ikegami (Patriarch) on Jan 18, 2021 at 15:57 UTC

    This is apparently a bug/limitation related to the copy-on-write (COW) mechanism.


    In the first snippet, $A and the constant initially share the same string buffer. Once $A is modified, the COW mechanism comes into play making an unshared copy of the string buffer for $A. Something goes wrong at this point.

    The workaround causes $A to have an unshared buffer going in.

    Note that simply appending to a string can cause the string's buffer to be replaced with another larger one. This apparently doesn't trigger the bug. If you use Devel::Peek's Dump on the scalar before and after the m//, you'll see the address of the string buffer (PV) has changed. Even when the workaround is in play. So it seems to be a problem specific to the COW mechanism (added to Perl in version 5.20).


    Finally, you shouldn't be surprised to have problems when modifying a variable on which an operation is being performed. for (@a) { say; pop(@a); } will give "interesting" results too, for example.

Re: Strange behaviour when regex variable is modified during match?
by betmatt (Scribe) on Jan 20, 2021 at 11:25 UTC
    My first thought are:

  • Please define what you mean when you 'soft code' a variable. Please give an absolutely clear definition. Define this in general terms and then define again in the specific context of Perl. Two definitions required.

  • Secondly:

  • Define sandbox program. What is this?


  • If necessary use a footnote section at the bottom of your question.

    I don't know the answer to your question. However I have a feeling that your problem might relate to multithreading and associated problems. If I am at all right in this 'feeling' (i.e. I don't know): Then you might need to tell us what processor chip your using.
      Hi.
      While I was playing with this regex, I wanted to have: $A = '1' x $n . '2'; for a simpler use (is was my soft-coding).
      By 'sandbox' I mean experimenting script, a draft... No multithreading in it. Simply learning regex'es and trying to find variety of ways how to manipulate any strings.