in reply to output unique lines only

This is easy to do with a hash. Open the file, read in one line at a time and use the split function to put the first element in each line (i.e. the filename) into a variable like $fn. If your hash is called %uniquefiles you then set the value for $fn to some arbitrary value, like

$uniquefiles{$fn} = 1;

If your loop comes across the same filename again, it will simply set the same value for the same filename, in effect eliminating the dupes. When you're all done %uniquefiles will only contain the unique filenames, which you can print like so:

foreach my $k (keys %uniquefiles) { print OUT "$k\n"; }
If you're just learning Perl, make sure you learn about hashes. They're a very powerful feature.

Steve

Replies are listed 'Best First'.
Re^2: output unique lines only
by sbp (Initiate) on Dec 07, 2005 at 03:21 UTC
    Thanks everyone for their tips/suggestions. I've decided to approach this using a hashtable.
    I came up with the following script but it doesn't seem to be working correctly.
    #!/usr/bin/perl -w $filelist = "/home/exp/acctlist.txt"; open(FILEDUPS, $filelist) || die ("Cannot open $filelist"); open($output, '>', '/home/exp/output.txt') || die ("Cannot open file"); while ($line = <FILEDUPS>) { chomp $line; ($filename, undef, undef, undef, undef) = split /\t/, $line; } $uniquefiles{$filename} = 1; foreach $k (keys %uniquefiles) { print $output "$k\n"; }
    It currently only outputs one line. For example, if my file contains
    filename1
    filename2
    filename1
    filename4

    Then it outputs the first line only:
    filename1
    Where as it should output:
    filename1
    filename2
    filename4

    I've spent a long time trying to debug this, but i'm not sure where i'm going wrong.
    Thanks.
      hi,
      I guess you should give the  $uniquefiles{$filename} = 1; inside the while loop.
      -kulls