comment on

Remove duplicate lines from a file while preserving order and leaving the last instance of the duplicate in place.



    perl -ne 'push @a, $_; $h{$_}++; END{print grep {not --$h{$_}} @a}
+' file
[download]

For the following file:

    cabbage
    apple
    banana
    grape
    pear
    banana
    carrot
    apple
    apple
    banana
    grape
    pear
    banana
    apple
[download]

The output would be:

    cabbage
    carrot
    grape
    pear
    banana
    apple
[download]

So for example the last instance of apple is printed instead of the first. For comparison try the following which leaves the first instance in place:

    perl -ne 'print unless $h{$_}++' file
[download]

If preserving order isn't important then you could use other system commands (if supported):

    sort -u file
    sort file | uniq
[download]

Note: this snippet potentially stores 2 copies of each line in memory so it is inefficient for large files.

In reply to Remove duplicate lines maintaining last-in order by jmcnamara

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.