in reply to a farewell to chop

I can well understand them getting rid of chop, although I do think the main problem is having it named so similarly to chomp.

When I was beginning Perl I came across chop and chomp, and whilst I could remember the fact that one wasn't fussy what it removed and one only removed end-of-line characters it took me a remarkable amount of time to learn which was which. Caused some nasty bugs, too.

Since I learnt the names properly I don't think I've ever touched chop for anything. So far you've only said you've been able to find one example, and people can do it with substr's (substr ($foo,-1) = '', admittedly messy), or a simple regexp $foo =~ s/.$//; which to me is perfectly readable. I really don't see the advantage of keeping chop paying off against the risk of having the confusing (and easily mis-typed) chomp/chop pair.

I also can't see many people are going to go and write their own version of chop, to be honest. It's a simple enough thing to 'just do' and the function call imposes a much higher overhead than the operation itself.

Replies are listed 'Best First'.
Re^2: a farewell to chop
by particle (Vicar) on Sep 11, 2002 at 15:50 UTC
    or a simple regexp $foo =~ s/.$//; which to me is perfectly readable

    ...but will not handle multi-byte characters. chop will.

    ~Particle *accelerates*

      Which, fortunately, is why we've got the marvellous \X sequence: s/\X$// will do what you want.

      --
      Tommy
      Too stupid to live.
      Too stubborn to die.

        \X fixes the utf8 problem, but there's still a problem with this regex...

        #!/usr/bin/perl -w use strict; use utf8; my($a,$b,$c); $a=$b=$c="123\n456\n"; print chop $a; # prints "\n" print $b =~ s/\X$//; # prints "1" print $a; # prints "123\n456" print $b; # prints "123\n45\n" ## OOPS!!!
        i believe s/\X\z// will do what you want, although it still won't return the character removed. instead, use substr EXPR,OFFSET,LEN,REPLACEMENT (i.e. substr $_,-1,1,'').

        ~Particle *accelerates*

      Doesn't the dot handle multi-byte UTF-8 characters when the string is of the character persuasion and "use utf8" is in scope?

      Update Perhaps you meant multiple codepoints used to "compose" one glyph, rather than multiple bytes to form one codepoint. The former is what \X does. Perl5 regex only does the latter; Perl6 is said to do the former too (u0, u1, and u2 levels if memory serves).

        that depends on your version of perl5 (i wish i give a specific example, but i don't have all those installs in front of me.)

        ~Particle *accelerates*

      ...but will not handle multi-byte characters.

      It will in Perl 6, and already does under the utf8 pragma in Perl 5.6+. Besides, as a Perl 5 regex, it doesn't make sense for $ matches before a trailing \n.

      - Yes, I reinvent wheels.
      - Spam: Visit eurotraQ.