comment on

Building a hash is a good way to go. With 60MB, it should all fit within memory. Of course an alternate idea would be to use system sort on that file and then deleting lines that occur more than once. This would be appropriate for GB size files. For this app, hash should work great.

A hash key can be any kind of string. There is actually not even a need to remove the \n!

my %hash;
while (<IN>)
{
  $hash{$_}++;
}

while ( my($key,$value) = each %hash)
{
  print $key if ($value == 1);
}
[download]

above would print non-duplicate lines. Note that there is no need to check "if exists" or "if defined", if a key doesn't exist, Perl will create it before the ++ increment!

Now let's say that there is some need to parse the line with split or a REGEX into 3 different things, $file,$line,$rule...There is no need to do a join to make the key.$hash{"$file$line$rule"}++; would be just fine.

Update: If this is necessary, you can put some token (could be multi-character or single ";",etc) between the items, like "$file;$line;$rule" so that you can use simple split to get the 3 things back without needing a HoL (Hash of List) in the value field. Think simple and make it more complex if you need to.

As far as "Perl Limitations" with complex data structures...there aren't any! A Perl equivalent to any kind of arbitrarily complex thing that you could make in C, can be made in Perl. Having said that, the Perl basic structures are super powerful! And I think enough for the app you have described. As far as execution time goes, I would think that we are talking seconds, not minutes as you can do everything with one single linear pass through the input file.

In reply to Re: Self-Populating Tree Data Structure by Marshall
in thread Self-Populating Tree Data Structure by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.