proactive1 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have data that is single line and each set is separated by double blank lines. I need to change this to single blank lines. Here is a data example so you can see clearly:
Sweed Machinery, Inc.
653 Second Ave., P.O. Box 228
Gold Hill, OR 97525-9733
541-855-1512, 800-888-1352
541-855-1165
www.sweed.com


Olympic Wire & Equipment, Inc.
P.O. Box 3227
Newport Beach, CA 92659
949-646-9731
949-646-6465
www.olympicequipment.com/index.php


LCI Industrial Distributors
A-1 Country Club Rd.
East Rochester, NY 14445
585-385-1390, 800-282-3294
585-385-1362
www.lcifinishing.com/


  • Comment on Double blank lines to single blank line

Replies are listed 'Best First'.
Re: Double blank lines to single blank line
by liverpole (Monsignor) on May 10, 2007 at 23:42 UTC
    Hi proactive1,

    My first approach would be a lazy one; just apply a regex to the single line representing the concatentation of all lines from the file ...

    use strict; use warnings; use File::Basename; use IO::File; my $iam = basename $0; (my $fname = shift) or die "syntax: $iam <file>\n"; my $fh = new IO::File($fname) or die "$iam: can't read $fname ($!)\n" +; undef $/; # Read file into a sing +le line (my $line = <$fh>) =~ s/\n(\s*\n){2,}/\n\n/g; # Delete extra blank li +nes printf "%s\n", $line; # Show the result

    You could also do it by reading a single line at a time, and only printing a single blank line for each group encountered; something like ...

    use strict; use warnings; use File::Basename; use IO::File; my $iam = basename $0; (my $fname = shift) or die "syntax: $iam <file>\n"; my $fh = new IO::File($fname) or die "$iam: can't read $fname ($!)\n" +; my $b_prev_blank = 0; # Nonzero each time the previous line was bla +nk my $b_this_blank = 0; # Nonzero each time the current line is blank while (my $line = <$fh>) { chomp $line; $b_this_blank = ($line =~ /^\s*$/); (!$b_prev_blank or !$b_this_blank) and printf "%s\n", $line; $b_prev_blank = $b_this_blank; }

    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: Double blank lines to single blank line
by monarch (Priest) on May 11, 2007 at 00:27 UTC
    I like this regular expression which is platform-independent (well, as far as Windows, Linux, and Mac systems are concerned):
    my $lineend = qr/\n\r|\r\n|\n|\r/; $data =~ s/($lineend){3}/$1$1/sg;

    This defines a regular expression which detects the end of a line. It first checked for a newline - carriage return combination. If this is not found it checks for a carriage return - newline combination. If neither of these two combinations are found, then it checks for a newline or a carriage return.

    The next expression assumes you have your entire file in $data. It then replaces 3 end-of-line combinations (2 blank lines) with 2 of them (2 end-of-line combinations will produce a single blank line).

    Actually, if the blank lines might contain whitespace, then you could consider the following expression instead:

    $data =~ s/($lineend)\s*($lineend)\s*($lineend)/$1$1/sg;

    (Regexps tested ok on linux).

Re: Double blank lines to single blank line
by shigetsu (Hermit) on May 10, 2007 at 23:45 UTC

    Dangerous (be sure to have an independent backup before you read on):

    If you want to edit the data in place, you could use

    perl -ibak -we 'local $/ = "\n\n"; while (<>) { local $/ = "\n"; chomp +; print }' ./data.dat

    Be aware that it'll apply the changes immediately, but will create a backup of the file edited with the extension .bak. Furthermore, this approach assumes that your _entire_ input file looks like the excerpt you provided. Bear in mind that otherwise perhaps, chaos will ensue.

      Yet another quick method, via "paragraph mode":
      perl -00 -wpe 1 data.dat >data_singled.dat
      See perlrun for the meanings of the flags.
Re: Double blank lines to single blank line
by GrandFather (Saint) on May 10, 2007 at 23:15 UTC

    What does your input code look like currently? The appropriate solution depends a lot on how you are doing stuff now and to some extent on what gets done with the data subsequently.


    DWIM is Perl's answer to Gödel
      A bit too deep into data massaging on this set of data to go back to original code. Good point though. I think I can fix the original code but I need to know what to do to work with the existing set with the double blank lines. There must be a script or a way to do this in word etc quickly.

        Something like:

        use strict; use warnings; my $run = 1; while (<DATA>) { next if $run && m/^\n/; print; $run = m/^\n/; } __DATA__ data as per sample

        with i/o adjusted for your requirements should do what you need to clean up multiple line breaks.


        DWIM is Perl's answer to Gödel