comment on

I am getting a bit fed up with people telling me to use chomp instead of chop. I know the manual says it's "safer" but I beg to differ. (Update: ~~for portability to e.g. Windows I've added something special at the end.~~ oops, no, chop is in fact portable)

So what is wrong with chomp()? I can already hear many of you say.

If we try to simulate a functional description of chomp() in the common or garden context, it might begin to become clear:

"Read in each line of a file (line being delimited by "\n") - test whether there really is a "\n" there - if so chop it off else carry on regardless".

I can't imagine anyone asking for this. And if an analyst knew you were interpreting their spec. that way you'd be apt to be corrected. There are two possibilities: a spec. contains provisions for handling data quality or it doesn't and 99 times out of 100 it doesn't - that's because systems analysts prefer (quite rightly) to focus on positive rather than negative functionality.

But why chomp() is a no-no is that negative functionality is apt to be introduced to handle known and/or anticipated functional problems. In one mission-critical system, incomplete files were checked for by whether a trailer record was placed on the end. If a file succeeds that check, it's impossible for records between the header and trailer not to be terminated with "\n". A trailer is a proper functional remedy, because a file might accidentally break just after a "\n" so you can't use the presence of "\n" alone to test for file completeness.

Having established completeness of a file, the only way a record can lose its "\n" unexpectedly is by means of a programming error, such as making a conditional substitution that happens implicitly to remove it but only where the pattern matches. Perhaps chomp() is popular to sloppy people who do that kind of thing and patch it up with chomp() first and ask questions never. Such patching up is grossly negligent because it confuses the testing process needed to find mistakes. Using chomp() can make it harder to detect the real fault that chomp(0 fails to patch up.

The advantage of chop() being used so that it might indeed chop off a character off the end of a \w+ is that it will show up in testing that a programming error has occurred and needs investigation, whereas chomp() is apt to hide the error until the system or acceptance testing phase of the system. I'd hate to mistakenly hire people who allowed that to happen out of a bad programming habit!

If we modify $/, e.g. to ';' to parse a Perl program, then of course the last line won't generally terminate with $/ but with "\n". In that case, it is clearly wrong to chomp regardless because the presence of $/ is a syntax requirement. In such cases, chomp() is no good anyway because it returns the length rather than the content of what is chopped. Instead we need to do something like:

( chop() eq ';' ) or SomeErrorHandling();
[download]

The greatest benefit of chomp() therefore is that it makes an easy test for sloppy programming -- ask a candidate to write a simple program that reads in a file you tell them in the spec. always has "\n" on the end of every line including the last and if they use chomp(), you already know enough about how they work and what quality of unit testing they are capable of rendering to their own code before it gets inflicted on others....

Update: ~~Unless of course you are writing code that is supposed also to be portable, including to Windows.~~ The exception is setting $/ to some multivalued character like EOL - it ISN'T multivalued for Wondows -- test it! Only in such very isolated cases do you need a special version e.g:

{
sub Chonk { # $/-aware chop 
            # parm by ref
    my $sref = shift() || $$_; # default $_ 
    $$sref = substr( $$sref, 0, length( $$sref ) - length( $/ ) );
    return substr( $$sref, -length( $/ ) );

}
[download]

hmm chop @array returns only what was chopped off the last element, even in array context, but I haven't decided what to do with this Chonk() that only came about because of this topic, but which might survive, who knows. Suggestions? I suppose I also expected someone to say : chomp() or die;should take care of your woes. It would at least reduce some of my objections about lifecycle issues. ____________________________________________

^M Free your mind!

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`