Re: chop vs chomp
by merlyn (Sage) on May 10, 2007 at 19:07 UTC
|
Having established completeness of a file, the only way a record can lose its "\n" unexpectedly is by means of a programming error, such as making a conditional substitution that happens implicitly to remove it but only where the pattern matches.
You are apparently unfamiliar with the idea that Unix files can have a missing newline at the end. As a programmer, I must deal with files like that.
Are you suggesting that instead of writing a simple chomp in each program that reads possibly-newline-terminated strings, I explictly put the code in there for that? I hope not. If Perl were that way, I'd be quickly writing "randalchomp" that acts like chomp does now, and figure out a way to include it in every program.
Chomp is there because it perfectly fills a need. That's why we use it.
| [reply] [d/l] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: chop vs chomp
by philcrow (Priest) on May 10, 2007 at 14:50 UTC
|
As Fletch alludes. There's a more important safety mechanism in chomp than silently skipping the action if there is nothing to chomp. It also allows operating system independence, since it eats the line ending for your OS. If you ask an applicant to write a script, they may choose chomp, merely because they don't know which operating systems you'll eventually ask them to deploy on.
Sorry, but porting to a new OS should not require chop hacking on code when a chomp in the first place would have handled the problem.
| [reply] |
|
It's more than just OS line ending issues: sometimes a logical record is more than one line. I've dealt with files in the past where the records consisted of several newline-terminated lines with a four character record separator along the lines of "EOR\n". chomp can handle removing this transparently (local $/ = "EOR\n"; while( <IN> ) { chomp; _handle_rec( $_ ) }), chop can't.
And that's the important distinction: chomp deals with removing the current logical record ending, chop deals with removing a single trailing character.
| [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in.
|
|
No, "\n" gets chopped or chomped whatever the OS thinks "\n" is.
__________________________________________________________________________________
^M Free your mind!
| [reply] |
Re: chop vs chomp
by chromatic (Archbishop) on May 10, 2007 at 18:05 UTC
|
Chonk is ridiculous. I count at least three misfeatures in the first line alone. Meanwhile, chomp has worked for years without those issues.
You're spending a lot of time justifying a poor decision.
| [reply] [d/l] |
Re: chop vs chomp
by eric256 (Parson) on May 10, 2007 at 19:20 UTC
|
It would seem to me (and I don't know the history) but this is probably the best reason I've seen why chomp returns what it does instead of the modified string. This way you can test its return and decide if worked as expected or if it is an error. Assuming that we all have your same need and care if there was a line ending is ridiculous. By your logic if i wanted to only remove the line ending i would have to first check if it had an ending and then only precede to remove it if needed. It seems like you are putting the burden then on the normal case instead of on your special case. I've worked with many data files and formats and not having to care about the line endings is a blessing not a curse. In addition if your are counting on the presence of a line ending to signify a valid record instead of checking the actual structure of the record i would think you are going to have more trouble, not less, in the future.
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: chop vs chomp
by Fletch (Bishop) on May 10, 2007 at 14:42 UTC
|
Because I'm sure no one ever has to deal with more than single character long record separator. chop handles those just fine right?
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: chop vs chomp
by graff (Chancellor) on May 11, 2007 at 07:52 UTC
|
If we modify $/, e.g. to ';' to parse a Perl program, then of course the last line won't generally terminate with $/ but with "\n". In that case, it is clearly wrong to chomp regardless because the presence of $/ is a syntax requirement. In such cases, chomp() is no good anyway because it returns the length rather than the content of what is chopped.
Either I'm not understanding what you are trying to say in that part, or else you don't understand what $/ and chomp are really doing.
Setting aside the issue of how foolish it would be to "parse" a perl script by setting $/ = ';' , let's look at what actually happens, given a perl script like this:
#!/usr/bin/perl
use strict;
for my $letter ( qw/h e l l o , w o r l d/ ) {
$_ .= $letter;
}
print;
while (1) {
last if ( /^o/ );
$_ =~ s/.//
}
If some other perl script sets $/=";" and reads this one as data in a while(<>) loop, it will go through five iterations; the first four records will have ";" as the very last character, and chomp, if applied, will remove it. The fifth record will end in "}", possibly followed by any sort and amount of whitespace, and chomp will have no effect on that.
Do you think that's a problem? I don't. That is the documented behavior. To the best of my knowledge, and based on all my own experience, that is the most desirable behavior, when the objective is to simply remove the input record separator when one is present.
There are other, less common situations where the old "chop" (remove last character, no matter what it is) is still useful, and I'm glad it's still available for that purpose.
If you are complaining about the fact that many novice perl users don't understand $/ and chomp, I can understand your frustration, and I agree more people should know more about it. But if you are complaining that you're having trouble using chomp, I have to wonder why. It's not that complicated.
When there are issues like a single script meant to be used on any OS and having to handle a mixture of CRLF and LF line terminations (in different files or even in a single file), then I agree that chomp is maybe not the best approach, because $/ does not allow that sort of flexibility. For that I would do something like s/[\r\n]+$// instead.
(In fact, I've done that in a number of scripts, and it has served me quite well.) | [reply] [d/l] [select] |
Re: chop vs chomp
by hardburn (Abbot) on May 10, 2007 at 20:03 UTC
|
Having the insight to see that a common piece of functionality is insufficient for the task at hand is good. It's unnecessary, though, to extend that to the general case of, say, parsing a random CSV file.
"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: chop vs chomp
by shmem (Chancellor) on May 11, 2007 at 04:54 UTC
|
*shrug* nuts vs. bolts?
So what is wrong with chomp()?
Nothing is wrong with chomp, and there's nothing wrong with chop either.
Both work as documented, and both have their uses.
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: chop vs chomp
by samizdat (Vicar) on May 10, 2007 at 16:25 UTC
|
| [reply] |
Re: chop vs chomp
by akho (Hermit) on May 10, 2007 at 14:25 UTC
|
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: chop vs chomp
by DrHyde (Prior) on May 11, 2007 at 09:00 UTC
|
In all your drivel about records, you seem to have forgotten that perl is really really good at handling TEXT. Text doesn't have to finish with \n so it's perfectly acceptable to say "read all the lines in this file and if they have a trailing \n strip it off". chomp($line) is a rather handy way of expressing that. I use it most days, and always without error. | [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: chop vs chomp
by monarch (Priest) on May 11, 2007 at 00:39 UTC
|
I've never used chomp() or chop().
I usually put portability as one of my top priorities when I program. Thus, I fear mishandling of different end-of-line sequences. So I almost always use the following skeleton code: while ( defined( my $line = <HANDLE> ) ) {
$line =~ s/\s*[\n\r]*\Z//s;
.
.
}
| [reply] [d/l] [select] |
A reply falls below the community's threshold of quality. You may see it by logging in. |