Duplicate fields

crochunter has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have duplicate lines like this in my file: I just want one line, i.e,

Andrew lives in NewYork United States
Andrew lives in NewYork
[download]

If I use uniq command of Unix

 
$ uniq filename
[download]

it will not remove this. How can I do it ? Thanks

Comment on Duplicate fields Select or Download Code

Replies are listed 'Best First'.
Re: Duplicate fields by Corion (Patriarch) on Jul 28, 2009 at 07:34 UTC
How do you determine that the two lines are duplicates? Most likely you will want to look substr.	[reply]
Re: Duplicate fields by si_lence (Deacon) on Jul 28, 2009 at 07:36 UTC
Your two example lines are not realy duplicates. You need to define when you consider two lines equal. When the first x words are equal? Or the first y characters? cheers, si_lence	[reply]
Re: Duplicate fields by Utilitarian (Vicar) on Jul 28, 2009 at 09:26 UTC
`... my $past=""; for my $record (sort @records){ chomp $record; if ($record !~ /^$past/{ push @uniq, $past; } $past=$record; } push @uniq, $past; ...` [download] Ugly, messes with the values of your data, but may be what you are looking for.	[reply] [d/l]
Re^2: Duplicate fields by mzedeler (Pilgrim) on Jul 28, 2009 at 10:47 UTC
Grin. Put in a single `A` on a line, and everything starting with that is gone. A good way to make the asker think again, but probably not the right solution.	[reply] [d/l]
Re^3: Duplicate fields by Utilitarian (Vicar) on Jul 28, 2009 at 11:03 UTC
Not so much, it drops repeats until a change, though it sorts the contents of the file and strips the newlines, so it's almost certainly the wrong answer, definitely not scalable, and the $past isn't quoted correctly, but other than that it works perfectly ;) `~/$ perl tmp.pl data A different line A line similar to the previous one Apples are tasty this is a line which contains the previous line in it's entirety this is not the same ~/$ cat data A A line similar to the previous one A different line Apples are tasty this is a line this is a line which contains the previous line this is not the same this is a line which contains the previous line in it's entirety` [download]	[reply] [d/l]