Removing double carriage return

dragooneye has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Removing double carriage return by GrandFather (Saint) on Aug 20, 2011 at 01:35 UTC
I can't see how that could even "kind of works". According to the perlrun description of the -p switch your code provided on the command line is wrapped in: `while (<>) { ... # your program goes here } continue { print or die "-p destination: $!\n"; }` [download] which deals a line at a time so your regex will never match. For such a multi-line matching regex to make sense you need to slurp the whole file into a string and run the regex on that. The easiest way to do that is write a small script rather than try to shoehorn it into a "one liner". However that gets to be a whole lot more work because you then have to handle things a file at a time instead of using Perl's -i and @ARGV magic in a simple fashion. However, with just a little work we can still use the magic: `use strict; use warnings; my @files = @ARGV; $^I = '.bak'; for my $file (@files) { local $/; @ARGV = $file; while (<>) { s/\n\n/\n/gs; print; } }` [download] True laziness is hard work	[reply] [d/l] [select]
Re^2: Removing double carriage return by dragooneye (Novice) on Aug 22, 2011 at 18:10 UTC
Thanks GrandFather. The code you posted works great! My "kind of works" statement was referring to the last line of code in my original post. I probably should have left this out as it is confusing.	[reply]
Re: Removing double carriage return by Cristoforo (Curate) on Aug 20, 2011 at 02:03 UTC
Use the file slurp ~~argument~~ switch, '0' `perl -i.bak -0pe 's/\n\n/\n/g' inputtext` Update: I chose zero instead of octal 0777 thinking it would be clearer but GrandFather may have a good point, (a byte having possible values of 0 to 255 and the base 10 value of 0777 is 511). I guess I was thinking also that double zero, 00, is the switch for paragraph mode reading.	[reply] [d/l]
Re^2: Removing double carriage return by GrandFather (Saint) on Aug 20, 2011 at 02:54 UTC
actually: `perl -i.bak -0777pe 's/\n\n/\n/g' inputtext` [download] may be sightly better as no byte value matches octal 777, although that's rather nit picking and I'd have not noticed if I hadn't have needed to look up perlrun to find out what -0 (the digit 0 btw) actually does. ;) True laziness is hard work	[reply] [d/l]
Re: Removing double carriage return by pvaldes (Chaplain) on Aug 20, 2011 at 01:40 UTC
match 3 or more carriage returns... maybe like this? `=~ m/\n{3,}/` ... so expanding the script provided from GrandFather `while (<>) { s/\n\n/\n/gs; print; }` [download] `while (<>) { next if $_ =~ m/\n{1}/; if ($_ =~ m/\n{2}){s/\n{2}/\n/gs} elsif ($_ =~ m/\n{3,}){s/\n{3,}/\n\s/gs} }` [download] or something like this...	[reply] [d/l] [select]
Re^2: Removing double carriage return by Anonymous Monk on Aug 20, 2011 at 01:48 UTC
A program that reads a file, line by line, will never encounter more than one carriage return	[reply]
Re^3: Removing double carriage return by i5513 (Pilgrim) on Aug 20, 2011 at 08:44 UTC
This is the why GrandFather is using "local $/;" in his script which is referenced by pvaldes reply. Update: I didn't read the comment of pvaldes before he updated it, thanks Anonymous for clarifying this issue	[reply]
Re^4: Removing double carriage return by Anonymous Monk on Aug 21, 2011 at 06:21 UTC
Re^2: Removing double carriage return by dragooneye (Novice) on Aug 22, 2011 at 18:21 UTC
Thanks pvaldes! This is a great addition to GrandFather's code. Unfortunately I can't get it to work exactly as you wrote. Minor issues: missing closing slashes for m/\n{2} and m/\n{3,} per my Perl system. My Perl system also could not interpret the \s escape char. After fixing the minor issues, the resultant file turns out blank. However, this will jumpstart my attempts to get a working script. I want the following logic. If one carriage return, do nothing. If two carriage returns, substitute with one. If three or more carriage returns, substitute all with 2. I'll post it when I can get it working.	[reply]
Re^2: Removing double carriage return by dragooneye (Novice) on Aug 22, 2011 at 18:35 UTC
My updated code that seems to work. Thanks again pvaldes for your help! pvaldes' code: `while (<>) { next if $_ =~ m/\n{1}/; if ($_ =~ m/\n{2}){s/\n{2}/\n/gs} elsif ($_ =~ m/\n{3,}){s/\n{3,}/\n\s/gs} }` [download] Mine: `while (<>) { if ($_ =~ m/\n{1}/) { } if ($_ =~ m/\n{2}/){ s/\n{2}/\n/gs; print; } elsif ($_ =~ m/\n{3,}/){ s/\n{3,}/\n/gs; print; } }` [download] Above does not work. Below now works (prob a crude way of doing it. Please reply if you have a more elegant way): `while (<>) { if ($_ =~ m/\S\n{2}\S/){ s/(\S)\n{2}(\S)/$1\n$2/gs; print; } elsif ($_ =~ m/\n{3,}/){ s/\n{3,}/\n\n/gs; print; } }` [download]	[reply] [d/l] [select]
Re^3: Removing double carriage return by Cristoforo (Curate) on Aug 23, 2011 at 04:58 UTC
From the command line, this replaces 3 or more newlines with 2 newlines or replaces exactly 2 newlines with 1 newline. `perl -i.bak -0777 -pe 's/(\n{3,}\|\n\n)/2 == length $1 ? "\n" : "\n\n"/eg' inputfile` Notice that this looks for the longest match first (so that 2 newlines won't match more than 2, i.e. 3 or more). Update: That could be simplified to: `perl -i.bak -0777 -pe 's/\n+/2 < length $& ? "\n\n" : "\n"/ge' inputfile`	[reply] [d/l] [select]
Re^4: Removing double carriage return by dragooneye (Novice) on Aug 26, 2011 at 17:29 UTC
Re^3: Removing double carriage return by pvaldes (Chaplain) on Aug 22, 2011 at 20:31 UTC
That's better than my code yep, like this you catch a last case that I was missing: a file without any \n. Your code is basically the same as this, but probably don't hurt if you add a last small else only to caught this cases and help in future reviews `while (<>) { if ($_ =~ m/\S\n{2}\S/){ s/(\S)\n{2}(\S)/$1\n$2/gs; print; } elsif ($_ =~ m/\n{3,}/){ s/\n{3,}/\n\n/gs; print; } else {print;} # do nothing (when you have 0 or 1 \n) }` [download]	[reply] [d/l]
Re: Removing double carriage return by parv (Parson) on Aug 20, 2011 at 14:04 UTC
A harder way would be to keep track of the types of consecutive lines seen while processing line by line. Print array contents only when n non-blank (however that is defined) consecutive lines are seen.	[reply]