I've noticed a discrepancy between the return value behavior of substr as an lvalue and as an rvalue. Consider,

$ perl -e's//foobar/;my $foo=substr($_,2,2,"rt");print $foo,$/' ob $ perl -e's//foobar/;my $foo=substr($_,2,2)="rt"; print $foo,$/' rt $
The four argument form as rvalue evaluates to the replaced string, which is what we usually expect of it. As an lvalue, the assignments are associated right to left (as we also expect) but the substr call appears to be evaluated twice, first as an lvalue in assignment, then as an rvalue to produce the return value of that expression.

It seems that the rvalue behavior is more useful, because it allows the old data to be kept on the fly. That sort of entropy suppression is useful in functions like select.

The observed behavior may be inherent to perl's design for lvalue subs. The implementation of substr in pp.c is complicated, and I confess to not yet understanding it.

I wonder whether this is a bug, or a feature, or just a fact of life. What do you think?

After Compline,
Zaxo

Replies are listed 'Best First'.
Re: An Oddity of substr
by BrowserUk (Patriarch) on Oct 10, 2003 at 07:09 UTC

    I think this is the same anomaly as I noticed at [substr] anomaly or mine?.

    I raised a perlbug (I can't find the ticket number off hand). The response I got (nearly a year later:) was that it was, at best, a documentation bug, and not something that needed to be fixed. I disagreed quite strongly with this.

    I don't have a problem with the two versions doing something different if it was clearly documented, then you could make your choice of which form to use dependant upon which result fits your needs.

    My concern was more that the result of the lvalue version produces a result that is neither one thing nor t'other.

    $s='foobar'; print '4-arg: ', substr( $t=$s,1,4,$_) , ' 3-arg: ', substr($t=$s,1,4)=$_ for qw[ a ab abc abcd abcde abcdef ]; 4-arg: ooba 3-arg: ar 4-arg: ooba 3-arg: abr 4-arg: ooba 3-arg: abcr 4-arg: ooba 3-arg: abcd 4-arg: ooba 3-arg: abcd 4-arg: ooba 3-arg: abcd

    As you can see, the 4-arg results are consistent, but the 3-arg version varies wildly.

    1. If the value being assigned is shorter than the section being replaced, the result is the length of the assignment, but its value is
      1. the value of the assigned value
      2. PLUS a random character from the unaffected part of the original value...in this case the last character. </li
    2. If the value being assigned is the same length as the section being replaced then the result is the value being assigned.
    3. If the value being assigned is longer than the section being replaced, then the result is the first N characters of the value being assigned, where N is the length of the section being replaced.

    No one is ever going to convince me that simply documenting this behaviour will make that logical, predicable or useful......still. "The problem space is a mess" :)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail

      The character from the unaffected part of the original value isn't random. It's actually very determinate, as you can see from the below example.
      $s='foobarstuv'; foreach $n (3 .. 5) { for $v (qw[ a ab abc abcd abcde abcdef ]) { $d = substr( $c=$s, 1, $n, $v); $f = substr( $e=$s, 1, $n) = $v; printf "4-arg: '%13s' (%5s) ... 3-arg: '%13s' (%5s)\n", $c, $d, +$e, $f; } print $/; } -------------- 4-arg: ' faarstuv' ( oob) ... 3-arg: ' faarstuv' ( aar) 4-arg: ' fabarstuv' ( oob) ... 3-arg: ' fabarstuv' ( aba) 4-arg: ' fabcarstuv' ( oob) ... 3-arg: ' fabcarstuv' ( abc) 4-arg: ' fabcdarstuv' ( oob) ... 3-arg: ' fabcdarstuv' ( abc) 4-arg: ' fabcdearstuv' ( oob) ... 3-arg: ' fabcdearstuv' ( abc) 4-arg: 'fabcdefarstuv' ( oob) ... 3-arg: 'fabcdefarstuv' ( abc) 4-arg: ' farstuv' ( ooba) ... 3-arg: ' farstuv' ( arst) 4-arg: ' fabrstuv' ( ooba) ... 3-arg: ' fabrstuv' ( abrs) 4-arg: ' fabcrstuv' ( ooba) ... 3-arg: ' fabcrstuv' ( abcr) 4-arg: ' fabcdrstuv' ( ooba) ... 3-arg: ' fabcdrstuv' ( abcd) 4-arg: ' fabcderstuv' ( ooba) ... 3-arg: ' fabcderstuv' ( abcd) 4-arg: ' fabcdefrstuv' ( ooba) ... 3-arg: ' fabcdefrstuv' ( abcd) 4-arg: ' fastuv' (oobar) ... 3-arg: ' fastuv' (astuv) 4-arg: ' fabstuv' (oobar) ... 3-arg: ' fabstuv' (abstu) 4-arg: ' fabcstuv' (oobar) ... 3-arg: ' fabcstuv' (abcst) 4-arg: ' fabcdstuv' (oobar) ... 3-arg: ' fabcdstuv' (abcds) 4-arg: ' fabcdestuv' (oobar) ... 3-arg: ' fabcdestuv' (abcde) 4-arg: ' fabcdefstuv' (oobar) ... 3-arg: ' fabcdefstuv' (abcde)

      The extra characters are the characters from the string after substitution, up to the length of the substitution. So, the behavior is quite determinate. As well, it's very consistent. It's just a little counter-intuitive. *shrugs*

      ------
      We are the carpenters and bricklayers of the Information Age.

      The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        Well damn! It's so obvious, I should'a recognised the pattern straight away. The left half of my brain musta been asleep:)

        Perhap between us we could come up with a nice simply, generic description for inclusion in the docs?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail

      Thanks, that's both a better illustration of the problem and an authoritative answer to my question. Not that I like the answer!

      After Compline,
      Zaxo

Re: An Oddity of substr (old news :)
by tye (Sage) on Oct 10, 2003 at 06:40 UTC