comment on

I was searching around for a way to remove duplicate lines from a text file that weren't necessarily adjacent to one another, while maintaining the line order. I found this solution by BrowserUk:

#! perl -sw
use strict;
my %lines;
#open DATA, $ARGV[0] or die "Couldn't open $ARGV[0]: $!\n";
while (<DATA>) {
    print if not $lines{$_}++;
}

__DATA__
this is a line
this is another line
yet another
and yet another still
this is a line
more
and more
and even more
this is a line
and this
and that
but not the other cos its a family website:)
[download]

It worked for me, but I don't know why. I still don't fully see the magic of hashes. Can someone explain why this works? Specifically the if not $lines{$_}++ structure? I really don't see how incrementing works here. Also, could you replace if not with unless?

Thanks!

In reply to Why does this hash remove duplicate lines by kangaroobin

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.