duplicate lines

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, The following script removes all duplicate lines but leaves blank lines where the duplicates were. Any way of improving the script to eliminate these blank lines without opening the file again?

open (DUPLICATE, "datafiles/linked_file.txt");
open (DUPremoved, ">" . "datafiles/linked_file_dup.txt");
while (<DUPLICATE>) {
chomp $_;
#print $_;
           print DUPremoved if not $lines{$_}++;
           print DUPremoved "\n";
           
                    }
close (DUPLICATE);
[download]

thanks

Comment on duplicate lines Download Code

Replies are listed 'Best First'.
Re: duplicate lines by ivancho (Hermit) on Jun 02, 2005 at 09:28 UTC
don't chomp, and remove the `print DUPremoved "\n";` line.	[reply] [d/l]
Re: duplicate lines by Tomtom (Scribe) on Jun 02, 2005 at 11:39 UTC
`open (DUPLICATE, "datafiles/linked_file.txt"); open (DUPremoved, ">" . "datafiles/linked_file_dup.txt"); print DUPremoved grep { !$saw{$_}++ } <DUPLICATE>; close DUPLICATE; close DUPremoved;` [download]	[reply] [d/l]
Re: duplicate lines by japhy (Canon) on Jun 02, 2005 at 13:20 UTC
Your method is fine, you're just doing something extra that you shouldn't be doing. You're ONLY printing the line to the file if it doesn't exist in your %lines hash yet. But then you're printing a newline regardless of that! Either do: `while (<DUPLICATE>) { print DUPremoved if not $lines{$_}++; }` [download] or, if you're really adamant about chomp()ing the line first, do: `while (<DUPLICATE>) { chomp; print DUPremoved "$_\n" if not $lines{$_}++; }` [download] Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]
Re: duplicate lines by terce (Friar) on Jun 02, 2005 at 09:31 UTC
and for the sake of clarity and completeness, don't forget to `close (DUPremoved);`	[reply] [d/l]
Re: duplicate lines by mikeraz (Friar) on Jun 02, 2005 at 11:17 UTC
Grrrrr the "solution" listed below is wrong. It removes adjacent duplicate lines. This is not what the requestor was asking. See the reponses from the other monks for a useful solution. If you're on a Unix system you can just `uniq` the file. But if you must do it in Perl . . . `my $lastline; while(<>) { print if $lastline ne $_; $lastline = $_; }` [download] Be Appropriate && Follow Your Curiosity	[reply] [d/l]
Re^2: duplicate lines by Tomtom (Scribe) on Jun 02, 2005 at 11:26 UTC
I'm not sure your solution works if the duplicate entries aren't on following each other : line 1 : "toto" line 2 : "titi" line 3 : "toto"	[reply]
Re^3: duplicate lines by mikeraz (Friar) on Jun 02, 2005 at 16:13 UTC
You're right, I misread the problem. I was thinking of repeated lines, not duplicates. Be Appropriate && Follow Your Curiosity	[reply]