Re: output unique lines only

This is easy to do with a hash. Open the file, read in one line at a time and use the split function to put the first element in each line (i.e. the filename) into a variable like $fn. If your hash is called %uniquefiles you then set the value for $fn to some arbitrary value, like

$uniquefiles{$fn} = 1;

If your loop comes across the same filename again, it will simply set the same value for the same filename, in effect eliminating the dupes. When you're all done %uniquefiles will only contain the unique filenames, which you can print like so:

foreach my $k (keys %uniquefiles) {
   print OUT "$k\n";
}
[download]

If you're just learning Perl, make sure you learn about hashes. They're a very powerful feature.

Steve

Comment on Re: output unique lines only Select or Download Code

Replies are listed 'Best First'.
Re^2: output unique lines only by sbp (Initiate) on Dec 07, 2005 at 03:21 UTC
Thanks everyone for their tips/suggestions. I've decided to approach this using a hashtable. I came up with the following script but it doesn't seem to be working correctly. `#!/usr/bin/perl -w $filelist = "/home/exp/acctlist.txt"; open(FILEDUPS, $filelist) \|\| die ("Cannot open $filelist"); open($output, '>', '/home/exp/output.txt') \|\| die ("Cannot open file"); while ($line = <FILEDUPS>) { chomp $line; ($filename, undef, undef, undef, undef) = split /\t/, $line; } $uniquefiles{$filename} = 1; foreach $k (keys %uniquefiles) { print $output "$k\n"; }` [download] It currently only outputs one line. For example, if my file contains filename1 filename2 filename1 filename4 Then it outputs the first line only: filename1 Where as it should output: filename1 filename2 filename4 I've spent a long time trying to debug this, but i'm not sure where i'm going wrong. Thanks.	[reply] [d/l]
Re^3: output unique lines only by kulls (Hermit) on Dec 07, 2005 at 04:11 UTC
hi, I guess you should give the `$uniquefiles{$filename} = 1;` inside the while loop. -kulls	[reply] [d/l]