crochunter has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have duplicate lines like this in my file: I just want one line, i.e,
Andrew lives in NewYork United States Andrew lives in NewYork
If I use uniq command of Unix
$ uniq filename
it will not remove this. How can I do it ? Thanks

Replies are listed 'Best First'.
Re: Duplicate fields
by Corion (Patriarch) on Jul 28, 2009 at 07:34 UTC

    How do you determine that the two lines are duplicates?

    Most likely you will want to look substr.

Re: Duplicate fields
by si_lence (Deacon) on Jul 28, 2009 at 07:36 UTC
    Your two example lines are not realy duplicates.
    You need to define when you consider two lines equal. When the first x words are equal? Or the first y characters?

    cheers, si_lence

Re: Duplicate fields
by Utilitarian (Vicar) on Jul 28, 2009 at 09:26 UTC
    ... my $past=""; for my $record (sort @records){ chomp $record; if ($record !~ /^$past/{ push @uniq, $past; } $past=$record; } push @uniq, $past; ...
    Ugly, messes with the values of your data, but may be what you are looking for.

      Grin. Put in a single A on a line, and everything starting with that is gone. A good way to make the asker think again, but probably not the right solution.

        Not so much, it drops repeats until a change, though it sorts the contents of the file and strips the newlines, so it's almost certainly the wrong answer, definitely not scalable, and the $past isn't quoted correctly, but other than that it works perfectly ;)
        ~/$ perl tmp.pl data A different line A line similar to the previous one Apples are tasty this is a line which contains the previous line in it's entirety this is not the same ~/$ cat data A A line similar to the previous one A different line Apples are tasty this is a line this is a line which contains the previous line this is not the same this is a line which contains the previous line in it's entirety