throop has asked for the wisdom of the Perl Monks concerning the following question:

Brethren

Looking yesterday at a listing of revision dates I saw

3-AUG-2007 8-AUG-2007 1-OCT-2007 1-OCT-2007-RevA 1 2
Hmm, that 1 and 2 look odd...

After sleuthing, I ran (something like)

($foo, $bar, $date) = qw(fooa bar7 1-OCT-2007-RevA); print "++$foo ++$bar ++$date\n"
and saw
foob bar8 1
Which puzzled me greatly until I found the auto-increment docs
The auto-increment operator has a little extra builtin magic to it. If you increment a variable that is numeric, or that has ever been used in a numeric context, you get a normal increment. If, however, the variable has been used in only string contexts since it was set, and has a value that is not the empty string and matches the pattern /^a-zA-Z*0-9*\z/ , the increment is done as a string, preserving each character within its range, with carry:
OK. Problem explained. But here I ask the Perl Monks for explanation. Why? Why limit the magic to strings that match the pattern? When is it ever better to return '1' than the incremented string?

throop

updated with minor cleanup

Replies are listed 'Best First'.
Re: String increment - reasoning
by shmem (Chancellor) on Oct 02, 2007 at 14:22 UTC
    Ahem.
    ($foo, $bar, $date) = qw(fooa bar7 1-OCT-2007-RevA); print "++foo ++bar ++date\n"
    and saw
    foob bar8 1

    Darn clever version of perl have you do, says Yoda. I get ++foo ++bar ++date. You mean

    print join(" ",++$foo,++$bar,++$date)

    which yields foob bar8 2.

    Why limit the magic to strings that match the pattern. When is it ever better to return '1' than the incremented string?

    That magic is a perl dwimmery for special cases, like incrementing filehandles of the form FH000. It is also limited to increment, string decrement isn't implemented.

    It is arguably better to return 15 when incrementing 14 floz instead of returning 14 flpa. Here perl rightly (imho) guesses that the number is more important than the unit, which would become nonsensical anyways if incremented.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      While I agree with the portion of your "arguably better" about incrementing the number rather than the unit1, I would still say that returning 15 rather than 15 floz is broken.

      I don't really see the benefit to any sort of 'increment one portion of the string and throw the rest away' scheme unless you're actively trying to confuse people by sometimes incrementing and returning the whole value and other times only incrementing and returning part of it. Since there's no real way to get it to reliably do the Right Thing without adding more syntax (e.g., something like ++[1]$foo for first part, ++[-1]$foo for the last part), it should at least be immediately clear what happened when it does a Wrong Thing. Incrementing 14 floz to 15 floz and 1-OCT-2007-RevA to 2-OCT-2007-RevA makes it much clearer what's going on than incrementing them to just 15 and 2, even aside from the minor detail that the non-truncated results are sometimes correct.

      1 ...although it's really just incrementing the first segment of the value which matches the pattern, which is correct in your example, but incorrect in others

        Incrementing "14 floz" to 15 is sane behavior when compared with Perl's behavior elsewhere when strings are used as numbers (which is if the leading part looks like a number, convert that part to a number and ignore the rest). Not dropping the not-a-number part for increments would be less regular.

        *shrug* if you don't want perl to dwim, be explicit, as always. For number operations, there's the rule "it's a number as long as it looks like a number (from left to right)".

        Incrementig 31-OCT-2007-RevA what should perl do? 31-OCT-2007-RevB, 32-OCT-2007-RevA or 1-NOV-2007-RevA ? The latter with or without leading zero? How could that decision be reliably coded?

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: String increment - reasoning
by BrowserUk (Patriarch) on Oct 02, 2007 at 14:05 UTC
    • Why limit the magic to strings that match the pattern.

      What would be the result of incrementing ')(~'?

    • When is it ever better to return '1' than the incremented string?

      IMO, never.

      String increment should fail loudly when the conditions for it are not met.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      String increment should fail loudly when the conditions for it are not met.

      Given how "++" is, in a sense, intrinsically overloaded, and given Perl's intrinsic flexibility about interpreting the "data type" (string? int? float?) of a scalar value, I would be hesitant to ask for whole lot of special condition checks on this operator, for fear of slowing down all the scripts that use it so often. There's only a limited amount of idiot-proofing that the language can do for you and still be deemed suitable for effective use.

      Even when strings do "meet the conditions", it's still no guarantee that you're going to end up with what you intended:

      $ perl -le '$x="12345678901234567890"; print $x; ++$x; print $x' 12345678901234567890 12345678901234567891 $ perl -e '$x="12345678901234567890";printf "%20.0f\n",$x; ++$x;printf + "%20.0f\n",$x' 12345678901234567168 12345678901234567168 # or, just as bad: $ perl -le '$x="12345678901234567890"; printf "%20.0f\n",$x; ++$x; pri +nt $x' 12345678901234567168 1.23456789012346e+19
      I agree with the view expressed below: the current behavior is consistent, in a way that makes sense, with other related behaviors. (++Fletch ! ... or should I say "Fletci !") (updated for grammar correction)
        I would be hesitant to ask for whole lot of special condition checks on this operator, for fear of slowing down all the scripts that use it so often.

        Maybe, but given that (assuming I understand correctly), the following is the code that implements the string increment:

        ... d = SvPVX(sv); while (isALPHA(*d)) d++; while (isDIGIT(*d)) d++; if (*d) { #ifdef PERL_PRESERVE_IVUV /* Got to punt this as an integer if needs be, but we don't issue warnings. Probably ought to make the sv_iv_please() that does the conversion if possible, and silently. */ const int numtype = grok_number(SvPVX_const(sv), SvCUR(sv), NULL); if (numtype && !(numtype & IS_NUMBER_INFINITY)) { /* Need to try really hard to see if it's an integer. 9.22337203685478e+18 is an integer. but "9.22337203685478e+18" + 0 is UV=9223372036854779904 so $a="9.22337203685478e+18"; $a+0; $a++ needs to be the same as $a="9.22337203685478e+18"; $a++ or we go insane. */ (void) sv_2iv(sv); if (SvIOK(sv)) goto oops_its_int; /* sv_2iv *should* have made this an NV */ if (flags & SVp_NOK) { (void)SvNOK_only(sv); SvNV_set(sv, SvNVX(sv) + 1.0); return; } /* I don't think we can get here. Maybe I should assert this And if we do get here I suspect that sv_setnv will croak. N +WC Fall through. */ #if defined(USE_LONG_DOUBLE) DEBUG_c(PerlIO_printf(Perl_debug_log,"sv_inc punt failed to co +nvert '%s' to IOK or NOKp, UV=0x%"UVxf" NV=%"PERL_PRIgldbl"\n", SvPVX_const(sv), SvIVX(sv), SvNVX(sv))); #else DEBUG_c(PerlIO_printf(Perl_debug_log,"sv_inc punt failed to co +nvert '%s' to IOK or NOKp, UV=0x%"UVxf" NV=%"NVgf"\n", SvPVX_const(sv), SvIVX(sv), SvNVX(sv))); #endif } #endif /* PERL_PRESERVE_IVUV */ sv_setnv(sv,Atof(SvPVX_const(sv)) + 1.0); return; } d--; while (d >= SvPVX_const(sv)) { if (isDIGIT(*d)) { if (++*d <= '9') return; *(d--) = '0'; } else { #ifdef EBCDIC /* MKS: The original code here died if letters weren't consecu +tive. * at least it didn't have to worry about non-C locales. The * new code assumes that ('z'-'a')==('Z'-'A'), letters are * arranged in order (although not consecutively) and that onl +y * [A-Za-z] are accepted by isALPHA in the C locale. */ if (*d != 'z' && *d != 'Z') { do { ++*d; } while (!isALPHA(*d)); return; } *(d--) -= 'z' - 'a'; #else ++*d; if (isALPHA(*d)) return; *(d--) -= 'z' - 'a' + 1; #endif } } /* oh,oh, the number grew */ SvGROW(sv, SvCUR(sv) + 2); SvCUR_set(sv, SvCUR(sv) + 1); for (d = SvPVX(sv) + SvCUR(sv); d > SvPVX_const(sv); d--) *d = d[-1]; if (isDIGIT(d[1])) *d = '1'; else *d = d[1];

        I think the cost of the test required to notice that the string doesn't meet the specified requirements would be minimal as the code already has to scan the string from the beginning to find the end of the complient part (if any):

        d = SvPVX(sv); while (isALPHA(*d)) d++; while (isDIGIT(*d)) d++;

        And it already follows that with a conditional check to detect if the scan found the end of the string:

        if (*d) {

        I think all it would take is the replacement of the line

        sv_setnv(sv,Atof(SvPVX_const(sv)) + 1.0);

        with something like:

        Perl_croak(aTHX_ "String increment invalid on string '%s'", SvPVX( +sv));

        There's possibly a bit more to it than that, but would that have a huge impact upon performance?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: String increment - reasoning
by ikegami (Patriarch) on Oct 02, 2007 at 14:05 UTC

    Incrementing the portion matching /[a-zA-Z]*[0-9]*\z/ (^ dropped) would make sense. However, you can do that pretty easily.

    s/([a-zA-Z]*[0-9]*)\z/++(my$x=$1)/e;