in reply to s/.// increases length - bug or badly documented feature

Very interesting. I remember reading in the Camel book that characters and bytes are treated consistently in RegExes. Apparently, there is still a grey zone in between.

Before you showed some code, I thought that you were referring to some eval trick:
$_ = '\040'; print "before: ($_) <",eval 'length("$_")',">\n"; s/.//; print "after: ($_) <",eval 'length("$_")',">\n"; __END__ output: before: (\040) <1> after: (040) <3>
What you have found requires further investigation.
 _  _ _  _  
(_|| | |(_|><
 _|   

Replies are listed 'Best First'.
Re: Re: s/.// increases length - bug or badly documented feature
by Juerd (Abbot) on Mar 01, 2002 at 09:27 UTC
    Also interesting is that the deparsed code is not equal.
    print length chr 12345
    outputs "1", and deparses to:
    print length "\343\200\271";
    which outputs "3" :)

    Between chr and ord, things are consistent: ord chr 12345 is 12345 (maybe it should return 12345 % 256?)

    Lbh ebgngrq guvf grkg naq abj lbh pna ernq vg. Fb jung? :) -- Whreq

      >bleadperl -MO=Deparse -e "print length chr 12345" print length "\x{3039}"; -e syntax OK
      This apparently will be fixed in 5.8.0 (due out in May probably), and I imagine the change will be backported to 5.6.2 as well.

      =cut
      --Brent Dax
      There is no sig.