in reply to Re^3: RegExp substitution
in thread RegExp substitution

Hi there, Thanks for your reply, I had tried something using the /g global earlier but I discounted it for changing every variable every time, here is what I had:
#!/usr/bin/perl #subs4.plx use warnings; use strict; #try using /g global to remember where I'm up to in a match my $pattern; $_ = "Three, Four, One, Two"; print ("\t\tCounting Program\n\n", $_, "\n\n"); my $correct; print "Is this sequence correct?(yes/no)\n"; $correct = <STDIN>; chomp ($correct); while ($correct ne "yes"){ print "Is the first number correct?\n"; my $first = <STDIN>; chomp ($first); if ($first ne "yes"){ print"What should it be?\n"; $first = <STDIN>; chomp ($first); /([A-Z][a-z]+)/g; s/$1/$first/g; } print "Is the second number correct?\n"; my $second = <STDIN>; chomp ($second); if ($second ne "yes"){ print"What should it be?\n"; $second = <STDIN>; chomp ($second); /([A-Z][a-z]+)/g; s/$2/$second/g; } print "Is the third number correct?\n"; my $third = <STDIN>; chomp ($third); if ($third ne "yes"){ print"What should it be?\n"; $third = <STDIN>; chomp ($third); /([A-Z][a-z]+)/g; s/$3/$third/g; } print "Is the fourth number correct?\n"; my $fourth = <STDIN>; chomp ($fourth); if ($fourth ne "yes"){ print"What should it be?\n"; $fourth = <STDIN>; chomp ($fourth); /([A-Z][a-z]+)/g; s/$4/$fourth/g; } #Final print print ($_, "\n\n"); print "Is this sequence correct now?(yes/no)\n"; $correct = <STDIN>; chomp ($correct); }
After running through each of the <STDIN>'s the final print is "Four, Four, Four, Four" - in retrospect this is possibly the closest I got to my actual solution! This is what prompted me to ask the question where-in I was looking for a way to ignore the first match of a RegEx the second time it's run. Again, thank you for your help, I feel I am close to a solution. -- Just had a thoguht pre-posting, it is possible (but perhaps not elegant) to run the RegEx /A-Za-z+/, save the result to a variable and substitute the match with whitespace.. then call the variable later.. but thinking about it this is just a cheat/hack and not really using the substitute fnction of a RegEx. Regards Keystone

Replies are listed 'Best First'.
Re^5: RegExp substitution
by AnomalousMonk (Archbishop) on Apr 11, 2014 at 00:01 UTC
    ...
    s/$1/$first/g;
    ...
    s/$2/$second/g;
    ...
    s/$3/$third/g;
    ...
    s/$4/$fourth/g;
    ...

    The critical thing to realize about this code is that the capture variables  $2 $3 $4 have never been set to any meaningful value. I.e., they have the undefined value undef. When the undefined value is interpolated into a string or a regex, it interpolates as  '' (the empty string), or, in the case of a regex,  // (the empty regex).

    ...
    /([A-Z][a-z]+)/g;
    s/$2/$second/g;
    ...

    This pair of statements and corresponding succeeding statement pairs is very interesting. I strongly recommend you insert the statement
        print qq{=== '$_' \n};  # FOR DEBUG
    or its equivalent after each and every of the  s/// substitution statements to monitor what's going on with the progressive 'correction' of the initial string.

    Here's a narrative. As you can see from the newly-added debug print statement, the first
        /([A-Z][a-z]+)/g;
        s/$1/$first/g;
    statement pair actually does something expected and useful: it replaces the first number with 'One'. The output from the debug print statement is
        === 'One, Four, One, Two'

    The second
        /([A-Z][a-z]+)/g;
        s/$2/$second/g;
    statement pair replaces all numbers with 'Two'! The output from the debug print statement is
        === 'Two, Two, Two, Two'

    The reason for this odd behavior is that when  $2 with an undefined value interpolates into  s/$2/$second/g; it produces the  // empty regex match pattern. This pattern is special: it uses the last successful regex match pattern for matching. The last successful match pattern was in the  /([A-Z][a-z]+)/g; statement immediately before the  s/// substitution statement. Therefore,
        s/$2/$second/g;
    interpolates (ignoring, as you do, the warning message) as if it were
        s//$second/g;
    which matches as if it were
        s/([A-Z][a-z]+)/$second/g;
    which replaces each and every match (because of the  /g modifier) against the  ([A-Z][a-z]+) pattern (i.e., something that looks like a number) with, in this case, 'Two'. Whew!

    And similarly for each subsequent  //; s///; statement pair.

    That ought to give you something to think about while you're reviewing the regex documentation.

    (BTW: The  /g modifier in the  /([A-Z][a-z]+)/g; statement is at best useless and at worst confusing and corrupting. You cannot use the  /g modifier in this way to "keep track" of match positions in successive matches. (The  /c modifier in conjunction with the  /g modifier does something like this in certain cases, but I don't really see how it could be adapted to serve here.) You will have to think of some other way to query the user about successive numbers in the original string so that they may be 'corrected' one by one.)

      Thank you, I realise it's perhaps not the norm but the way you've broken down the code here is the way I'm currently thinking while programming in Perl and was very easy to understand. Furthermore, having seen your other post recommending I take a step back and look at basic RegEx I'm now going through those tutorials.