comment on

You're using a data structure which takes a constant time to access a single elements, and you're searching through all of its elements then searching a second data structure that takes a time linear to the number of elements in it in order to see if the two have an element in common. As GrandFather suggests, this is backwards.

You can, as GrandFather noted, turn this inside out:

foreach ( @array ) {
    if ( exists $hash{$_} ) {
        # do something
    } else {
        # do something else
    }
}
[download]

You could also, if you don't have to build both structures (which we cannot tell without more context), do something more like this:

my %hash = ( ... );
while ( $_ = get_item() ) {
    if ( exists $hash{ $_ } ) { ### update: fixed syntax error typo
        # do something
    } else {
        # do something else
    }
}
[download]

... which would keep you from ever building the huge array (where get_item() returns one value that would have been assigned to an element of @array in the other version). As I said, though, we can't tell if you need the array without more information.

The former solution assumes you needed to have the array anyway, while the second hopes to avoid it if possible. If there's nothing you can do about having such a large array, you'll at least want to do the lookups using the hash.

In your original solution, you end up with the running time being the product of a number linear to the number of keys in the hash times a number linear to the number of elements in the array. Think f(n*o). You could scan linearly through the array, and do a constant time lookup on the hash each time f(n), or more specifically f(n+x*n), but that's trivially f(n), which is what GrandFather's suggestion gets you. The necessary time is cut roughly by a factor of the number of keys in the hash, which is a big amount when you're talking about 1000 keys. 1/1000 of the time is always a good speedup.

The solution that would let you not use the array (if possible) would likely be a bit faster still because it wouldn't require the memory for the array. An array with 800k elements is likely to get into a page file somewhere if the elements are of any size at all. So I guess the question here is, where is the array from, and how is it built up?

In reply to Re: performance issues with grepping large sorted arrays by mr_mischief
in thread performance issues with grepping large sorted arrays by HarshaHegde

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.