I am getting a bit fed up with people telling me to use chomp instead of chop. I know the manual says it's "safer" but I beg to differ. (Update: for portability to e.g. Windows I've added something special at the end. oops, no, chop is in fact portable)

So what is wrong with chomp()? I can already hear many of you say.

If we try to simulate a functional description of chomp() in the common or garden context, it might begin to become clear:

"Read in each line of a file (line being delimited by "\n") - test whether there really is a "\n" there - if so chop it off else carry on regardless".

I can't imagine anyone asking for this. And if an analyst knew you were interpreting their spec. that way you'd be apt to be corrected. There are two possibilities: a spec. contains provisions for handling data quality or it doesn't and 99 times out of 100 it doesn't - that's because systems analysts prefer (quite rightly) to focus on positive rather than negative functionality.

But why chomp() is a no-no is that negative functionality is apt to be introduced to handle known and/or anticipated functional problems. In one mission-critical system, incomplete files were checked for by whether a trailer record was placed on the end. If a file succeeds that check, it's impossible for records between the header and trailer not to be terminated with "\n". A trailer is a proper functional remedy, because a file might accidentally break just after a "\n" so you can't use the presence of "\n" alone to test for file completeness.

Having established completeness of a file, the only way a record can lose its "\n" unexpectedly is by means of a programming error, such as making a conditional substitution that happens implicitly to remove it but only where the pattern matches. Perhaps chomp() is popular to sloppy people who do that kind of thing and patch it up with chomp() first and ask questions never. Such patching up is grossly negligent because it confuses the testing process needed to find mistakes. Using chomp() can make it harder to detect the real fault that chomp(0 fails to patch up.

The advantage of chop() being used so that it might indeed chop off a character off the end of a \w+ is that it will show up in testing that a programming error has occurred and needs investigation, whereas chomp() is apt to hide the error until the system or acceptance testing phase of the system. I'd hate to mistakenly hire people who allowed that to happen out of a bad programming habit!

If we modify $/, e.g. to ';' to parse a Perl program, then of course the last line won't generally terminate with $/ but with "\n". In that case, it is clearly wrong to chomp regardless because the presence of $/ is a syntax requirement. In such cases, chomp() is no good anyway because it returns the length rather than the content of what is chopped. Instead we need to do something like:

( chop() eq ';' ) or SomeErrorHandling();
The greatest benefit of chomp() therefore is that it makes an easy test for sloppy programming -- ask a candidate to write a simple program that reads in a file you tell them in the spec. always has "\n" on the end of every line including the last and if they use chomp(), you already know enough about how they work and what quality of unit testing they are capable of rendering to their own code before it gets inflicted on others....

Update: Unless of course you are writing code that is supposed also to be portable, including to Windows. The exception is setting $/ to some multivalued character like EOL - it ISN'T multivalued for Wondows -- test it! Only in such very isolated cases do you need a special version e.g:

{ sub Chonk { # $/-aware chop # parm by ref my $sref = shift() || $$_; # default $_ $$sref = substr( $$sref, 0, length( $$sref ) - length( $/ ) ); return substr( $$sref, -length( $/ ) ); }
hmm chop @array returns only what was chopped off the last element, even in array context, but I haven't decided what to do with this Chonk() that only came about because of this topic, but which might survive, who knows. Suggestions? I suppose I also expected someone to say : chomp() or die;should take care of your woes. It would at least reduce some of my objections about lifecycle issues. ____________________________________________

^M Free your mind!


In reply to chop vs chomp by Moron

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.