Re: Redefining chomp()
by Limbic~Region (Chancellor) on Mar 24, 2004 at 14:09 UTC
|
Anonymous Monk,
As has already been pointed out, chomp does probably does not work the way you think.
- It removes trailing $/ from the end of the line
- It returns the number of chars removed
- It can work on a list
- In the case of a list, it will return total number removed
I do not believe just modifying $/ will work for you. For one, it will likely mess up reading in new files. Secondly, I am under the impression you want it to auto-detect if $/ should be "\n" or "\r\n" depending on what it is working on.
Here is a start:
BEGIN {
*CORE::GLOBAL::chomp = sub {
my $count;
for ( @_ ) {
$count += $_ =~ s/[\r\n]$//g;
}
return $count;
}
}
This breaks in a lot of ways.
- It will remove all trailing newlines instead of just one in the case of my $string = "foo\n\n\n"
- It doesn't see where it is supposed to stop working (without helping parens) in the case of print chomp $foo, "\n";
- Probably a lot of others I didn't find
Once fixed appropriately, this could be stuck in a module and then you could just use Chomp;
Cheers - L~R
| [reply] [d/l] |
|
|
As there is no quantifier on your character class, I have trouble understanding how this will remove multiple newlines as you say. Am I missing something?
Also, if the Windows newline is in fact "\r\n", this will not work, again because there is no quantifier.
It seems to me if you instead do
$count += $_ =~ s/\r?\n$//g;
... then you alleviate the problem of killing all trailing newlines and, assuming "\r\n" is what all windows newlines are, you are matching both windows and unix newlines. | [reply] [d/l] |
|
|
ryantate,
My regex fu is non-existant as you can see. That does not really matter much as I said it would need to be fixed. I was on my way to a meeting so I didn't get to spend a lot of time on it. After thinking about it some more, I think the following would work a lot better.
package Chomp;
use Scalar::Readonly ':all';
BEGIN {
*CORE::GLOBAL::chomp = sub {
readonly_on( $/ );
my ($count, $fix) = (0, '');
local $/ = "\r\n";
for ( @_ ) {
my ($first, $second) = (0, 0);
eval { $first = chomp };
if ( $@ ) {
die $@ if $_ !~ /^\r?\n$/;
$fix = $_;
last;
}
if ( ! $first ) {
local $/ = "\n";
$second = chomp;
}
$count += $first + $second;
}
readonly_off( $/ );
return $fix ? ($count , $fix) : $count;
};
}
42;
# Then a script that uses it
#!/usr/bin/perl
use strict;
use warnings;
use Chomp;
my $foo = "foo\n\r\n";
my $bar = "bar\n\n\n";
print chomp $foo, $/; # prints 2
print chomp $bar, "\n"; # prints 1
print chomp ($foo, $bar), "\n"; # prints 2
This does have the unfortunate side effect of not allowing someone to do:
chomp($/); # $/ = undef;
I know this is ugly and there are probably a few more gotchas in there, but It was kind of fun to work on. Note: This can be done without a module, but it is much uglier. Anyone wanting to see that should say so.
Cheers - L~R
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
This was exactly what I was looking for.
I don't anticipate that it will break anything that I am doing. Though, testing will be in order.
Modifying $/ global would, indeed, be a very bad thing to do. I knew this was a possible solution, but i would have rather done it locally to each chomp().
| [reply] |
|
|
If that's all you wanted to do, a simple perl -pi -e 's/chomp([^;]*);/{local $/="\r\n";chomp$1;}/gm;' <your files here> would have sufficed ... wouldn't it?
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
| [reply] [d/l] |
|
|
Wouldn't this be a good place to use local? Something like
{
local $/ = "\n\r";
chomp;
}
protect the global state of $/ and let you deal with changing
it's behavior right before the chomp. The scope could be increased until you've captured all of your chomp calls. Any new chomps you write, unless they're in the same scope as the local call would use the default value of $/
code is untested. | [reply] [d/l] |
|
|
Re: Redefining chomp()
by dragonchild (Archbishop) on Mar 24, 2004 at 13:45 UTC
|
If you read the description of chomp, you will notice that ... it deletes the terminating string corresponding to the current value of $/ ....
Further down, it says
With version 5.6, the meaning of chomp changes slightly in that input disciplines are allowed to override the value of the $/ variable and mark strings as to how they should be chomped. This has the advantage that an input discipline can recognize more than one variety of line terminator ...
If you can, I'd look at that. If you can't, look at CORE::chomp().
------
We are the carpenters and bricklayers of the Information Age.
Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose
| [reply] |
Re: Redefining chomp()
by jaa (Friar) on Mar 24, 2004 at 14:08 UTC
|
Why not pass your data files through dos2unix before letting them near your script?
or transfer them into the *nix world with ASCII mode ftp?
| [reply] |
Re: Redefining chomp()
by matija (Priest) on Mar 24, 2004 at 13:46 UTC
|
Chomp removes any trailing string that corresponds to $/ or use English; $INPUT_RECORD_SEPARATOR,
so my guess would that you just need to modify that... | [reply] |