Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, The following script removes all duplicate lines but leaves blank lines where the duplicates were. Any way of improving the script to eliminate these blank lines without opening the file again?
open (DUPLICATE, "datafiles/linked_file.txt"); open (DUPremoved, ">" . "datafiles/linked_file_dup.txt"); while (<DUPLICATE>) { chomp $_; #print $_; print DUPremoved if not $lines{$_}++; print DUPremoved "\n"; } close (DUPLICATE);
thanks

Replies are listed 'Best First'.
Re: duplicate lines
by ivancho (Hermit) on Jun 02, 2005 at 09:28 UTC
    don't chomp, and remove the  print DUPremoved "\n"; line.

Re: duplicate lines
by Tomtom (Scribe) on Jun 02, 2005 at 11:39 UTC
    open (DUPLICATE, "datafiles/linked_file.txt"); open (DUPremoved, ">" . "datafiles/linked_file_dup.txt"); print DUPremoved grep { !$saw{$_}++ } <DUPLICATE>; close DUPLICATE; close DUPremoved;
Re: duplicate lines
by japhy (Canon) on Jun 02, 2005 at 13:20 UTC
    Your method is fine, you're just doing something extra that you shouldn't be doing. You're ONLY printing the line to the file if it doesn't exist in your %lines hash yet. But then you're printing a newline regardless of that! Either do:
    while (<DUPLICATE>) { print DUPremoved if not $lines{$_}++; }
    or, if you're really adamant about chomp()ing the line first, do:
    while (<DUPLICATE>) { chomp; print DUPremoved "$_\n" if not $lines{$_}++; }

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: duplicate lines
by terce (Friar) on Jun 02, 2005 at 09:31 UTC
    and for the sake of clarity and completeness, don't forget to close (DUPremoved);
Re: duplicate lines
by mikeraz (Friar) on Jun 02, 2005 at 11:17 UTC

    Grrrrr the "solution" listed below is wrong. It removes adjacent duplicate lines. This is not what the requestor was asking. See the reponses from the other monks for a useful solution.

    If you're on a Unix system you can just `uniq` the file. But if you must do it in Perl . . .

    my $lastline; while(<>) { print if $lastline ne $_; $lastline = $_; }
    Be Appropriate && Follow Your Curiosity
      I'm not sure your solution works if the duplicate entries aren't on following each other :

      line 1 : "toto"
      line 2 : "titi"
      line 3 : "toto"

        You're right, I misread the problem. I was thinking of repeated lines, not duplicates.

        Be Appropriate && Follow Your Curiosity