Re: Removing double carriage return
by GrandFather (Saint) on Aug 20, 2011 at 01:35 UTC
|
I can't see how that could even "kind of works". According to the perlrun description of the -p switch your code provided on the command line is wrapped in:
while (<>) {
... # your program goes here
} continue {
print or die "-p destination: $!\n";
}
which deals a line at a time so your regex will never match. For such a multi-line matching regex to make sense you need to slurp the whole file into a string and run the regex on that. The easiest way to do that is write a small script rather than try to shoehorn it into a "one liner". However that gets to be a whole lot more work because you then have to handle things a file at a time instead of using Perl's -i and @ARGV magic in a simple fashion. However, with just a little work we can still use the magic:
use strict;
use warnings;
my @files = @ARGV;
$^I = '.bak';
for my $file (@files) {
local $/;
@ARGV = $file;
while (<>) {
s/\n\n/\n/gs;
print;
}
}
True laziness is hard work
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Removing double carriage return
by Cristoforo (Curate) on Aug 20, 2011 at 02:03 UTC
|
Use the file slurp argument switch, '0'
perl -i.bak -0pe 's/\n\n/\n/g' inputtext
Update: I chose zero instead of octal 0777 thinking it would be clearer but GrandFather may have a good point, (a byte having possible values of 0 to 255 and the base 10 value of 0777 is 511). I guess I was thinking also that double zero, 00, is the switch for paragraph mode reading. | [reply] [d/l] |
|
|
perl -i.bak -0777pe 's/\n\n/\n/g' inputtext
may be sightly better as no byte value matches octal 777, although that's rather nit picking and I'd have not noticed if I hadn't have needed to look up perlrun to find out what -0 (the digit 0 btw) actually does. ;)
True laziness is hard work
| [reply] [d/l] |
Re: Removing double carriage return
by pvaldes (Chaplain) on Aug 20, 2011 at 01:40 UTC
|
match 3 or more carriage returns... maybe like this?
=~ m/\n{3,}/
... so expanding the script provided from GrandFather
while (<>) {
s/\n\n/\n/gs;
print;
}
while (<>) {
next if $_ =~ m/\n{1}/;
if ($_ =~ m/\n{2}){s/\n{2}/\n/gs}
elsif ($_ =~ m/\n{3,}){s/\n{3,}/\n\s/gs}
}
or something like this... | [reply] [d/l] [select] |
|
|
A program that reads a file, line by line, will never encounter more than one carriage return
| [reply] |
|
|
| [reply] |
|
|
|
|
Thanks pvaldes!
This is a great addition to GrandFather's code. Unfortunately I can't get it to work exactly as you wrote.
Minor issues: missing closing slashes for m/\n{2} and m/\n{3,} per my Perl system.
My Perl system also could not interpret the \s escape char.
After fixing the minor issues, the resultant file turns out blank. However, this will jumpstart my attempts to get a working script. I want the following logic. If one carriage return, do nothing. If two carriage returns, substitute with one. If three or more carriage returns, substitute all with 2.
I'll post it when I can get it working.
| [reply] |
|
|
while (<>) {
next if $_ =~ m/\n{1}/;
if ($_ =~ m/\n{2}){s/\n{2}/\n/gs}
elsif ($_ =~ m/\n{3,}){s/\n{3,}/\n\s/gs}
}
Mine:
while (<>) {
if ($_ =~ m/\n{1}/) {
}
if ($_ =~ m/\n{2}/){
s/\n{2}/\n/gs;
print;
}
elsif ($_ =~ m/\n{3,}/){
s/\n{3,}/\n/gs;
print;
}
}
Above does not work. Below now works (prob a crude way of doing it. Please reply if you have a more elegant way):
while (<>) {
if ($_ =~ m/\S\n{2}\S/){
s/(\S)\n{2}(\S)/$1\n$2/gs;
print;
}
elsif ($_ =~ m/\n{3,}/){
s/\n{3,}/\n\n/gs;
print;
}
}
| [reply] [d/l] [select] |
|
|
From the command line, this replaces 3 or more newlines with 2 newlines or replaces exactly 2 newlines with 1 newline.
perl -i.bak -0777 -pe 's/(\n{3,}|\n\n)/2 == length $1 ? "\n" : "\n\n"/eg' inputfile
Notice that this looks for the longest match first (so that 2 newlines won't match more than 2, i.e. 3 or more).
Update: That could be simplified to:
perl -i.bak -0777 -pe 's/\n+/2 < length $& ? "\n\n" : "\n"/ge' inputfile
| [reply] [d/l] [select] |
|
|
|
|
That's better than my code yep, like this you catch a last case that I was missing: a file without any \n. Your code is basically the same as this, but probably don't hurt if you add a last small else only to caught this cases and help in future reviews
while (<>) {
if ($_ =~ m/\S\n{2}\S/){
s/(\S)\n{2}(\S)/$1\n$2/gs;
print;
}
elsif ($_ =~ m/\n{3,}/){
s/\n{3,}/\n\n/gs;
print;
}
else {print;} # do nothing (when you have 0 or 1 \n)
}
| [reply] [d/l] |
Re: Removing double carriage return
by parv (Parson) on Aug 20, 2011 at 14:04 UTC
|
A harder way would be to keep track of the types of consecutive lines seen while processing line by line. Print array contents only when n non-blank (however that is defined) consecutive lines are seen.
| [reply] |