Remove duplicate lines maintaining last-in order

Remove duplicate lines from a file while preserving order and leaving the last instance of the duplicate in place.



    perl -ne 'push @a, $_; $h{$_}++; END{print grep {not --$h{$_}} @a}
+' file
[download]

For the following file:

    cabbage
    apple
    banana
    grape
    pear
    banana
    carrot
    apple
    apple
    banana
    grape
    pear
    banana
    apple
[download]

The output would be:

    cabbage
    carrot
    grape
    pear
    banana
    apple
[download]

So for example the last instance of apple is printed instead of the first. For comparison try the following which leaves the first instance in place:

    perl -ne 'print unless $h{$_}++' file
[download]

If preserving order isn't important then you could use other system commands (if supported):

    sort -u file
    sort file | uniq
[download]

Note: this snippet potentially stores 2 copies of each line in memory so it is inefficient for large files.

Comment on Remove duplicate lines maintaining last-in order Select or Download Code

Replies are listed 'Best First'.
Re: Remove duplicate lines maintaining last-in order (alt) by tye (Sage) on Sep 23, 2003 at 16:06 UTC
Cool. Here is a somewhat more memory-efficient alternative: `perl -ne '$a[$h{$_}]= ""; push @a, $_; $h{$_}= $#a; END{print for @a}' + file` [download] Or, if you want to cut even that memory foot-print (or does it?) roughly in half (at the cost of more CPU): `perl -ne '$h{$_}= $.; END{print for sort {$h{$a} <=> $h{$b}} keys %h}' + test.txt` [download] - tye	[reply] [d/l] [select]