Re: Hash versus chain of elsifs
by dsheroh (Monsignor) on Nov 22, 2021 at 08:43 UTC
|
Between the two approaches you mentioned, I would be extremely surprised if the hash did not turn out to be more efficient, as it doesn't have to step through a list of thousands of elsifs. It also has the advantage of taking constant time regardless of whether the string is found or not, rather than being very quick if the string is at the start of the elsif chain and being slower for a non-matching string, which would have to individually check every item in the list before it can be known that the string isn't there.
More importantly, though, the hash approach is much more readable than an endless list of elsifs, and it also opens up the possibility of storing your list of items to match against in a config file or a database (since a hash's contents are just data, not code) which will make for easier long-term maintenance.
BTW, the syntax you probably want for checking whether a hash key exists is exists($hash{$string}), not defined (which will return false if the key is there but has no value). | [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
if( exists _get_cache()->{ $candidate } ) {
say qq{IT DOES};
} else {
say qq{No such luck . . .};
}
{ ## Block to scope our cache to just these subs
my $lookup_cache = undef;
sub _reset_cache { $lookup_cache = undef; }
sub _get_cache { $lookup_cache //= _load_cache(); }
sub _load_cache {
## presuming you've declared file var somewhere above . . .
open( my $fh, q{<}, $CACHE_FILE_NAME )
or die qq{Can't open '$CACHE_FILE_NAME': $!\n};
$lookup_cache = {};
while( <$fh> ) {
chomp;
$lookup_cache->{ $_ } = 1;
}
close( $fh );
return $lookup_cache;
}
} ## End of limited scope block.
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] [d/l] |
|
|
| [reply] |
|
|
|
|
|
|
|
|
|
|
|
|
Re: Hash versus chain of elsifs
by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 09:32 UTC
|
| [reply] |
|
|
The squirrel PDF of best practices you point to is quite interesting. Some I already practice, some are new, others I need more clarification on. Is there a page with more detail on some of the enumerated items, maybe with simple examples for some of them?
| [reply] |
|
|
You're meant to buy the book Perl Best Practices from O'Reilly
... though you could try clicking on "Start your free trial".
It's considered unethical to search the internet for free pirated PDFs. :)
TheDamian is a brilliant writer and it's a fantastic book IMHO - where it was described by Yanick as "worth it's weight in depleted uranium bars, and twice as entertaining".
| [reply] |
|
|
Re: Hash versus chain of elsifs
by kikuchiyo (Hermit) on Nov 22, 2021 at 09:28 UTC
|
Is the set of strings fixed and known in advance?
If yes, look up minimal perfect hashing. One has to be generated specifically for your set of strings, and the generation part is usually expensive, but once it's done it can tell whether an input string is in the set with just one hash lookup and one string comparison.
Compilers and similar programs use this technique when they want to match tokens from the input stream against a small, known set of keywords.
I don't know if it's worth the effort, though, if you're working in Perl - just use a regular hash and be done with it. You may want to look at the problem again when you're optimizing your program, but you'll likely find that hash access is not your main bottleneck.
| [reply] |
Re: Hash versus chain of elsifs
by LanX (Saint) on Nov 22, 2021 at 10:32 UTC
|
The only thing in Perl which might be able to rival a hash look-up is a or-regex using the Tree° Trie optimization, especially if you have repeated patterns.
$in =~ m/^($STR1|$STR2|$STR3|...)$/
You should provide more details, for a definitive answer.
But a trie will always beat if-else with eq comparisons.
updates
°) that's not the name... how was it called again... ah ... s/Tree/Trie/
> and does the "best" approach vary by scale?
yes. see ${^RE_TRIE_MAXBUF} | [reply] [d/l] [select] |
|
|
On the regex suggestion, once upon a time, I did a project which used Regexp::Assemble to take a long list of rexeges and combine them all into a single, monstrously unreadable, regex that would tell me in one shot whether any matched. I seem to recall hearing at some point that similar functionality had been added to the Perl core, but don't recall details.
In any case, if you're looking for fixed strings rather than regex patterns, substring matches, etc., I would still expect hashes to be faster, but something based on Regexp::Assemble or a similar technique could be close enough to warrant a benchmark. And you might still want to go that route even if it's a little slower in order to gain the extra flexibility in how you match.
| [reply] |
|
|
| [reply] |
Re: Hash versus chain of elsifs
by perlfan (Parson) on Nov 25, 2021 at 14:42 UTC
|
In terms of complexity, a hash key look up is always the most efficient being O(1). The drawback, as has been mentioned, is that this requires a key. If you're dealing with a known set of strings, this is as efficient as possible. So to be clear, if you're going to do a cascade of if ( $str eq 'somestring' )... then use a hash look up.
If dealing with a set of conditions (e.g., a range of values), then you need to use a conditional somewhere. The question becomes, how to compute a static hash key from a value (be it a range, regex, mathematical function, etc).
E.g.,
use strict;
use warnings;
my $value_buckets = {
'first half' => sub { print qq{first half!\n} },
'second half' => sub { print qq{second half!\n} },
'third half' => sub { print qq{third half!\n} },
'not found' => sub { print qq{not found!\n} },
};
my $key = compute_key(42);
$value_buckets->{$key}->();
# just an example, could be anything to derive a hash
# key from $value
sub compute_key {
my $value = shift;
if ( $value > 0 and $value < 33 ) {
return 'first half';
}
elsif ( $value >= 33 and $value <= 66 ) {
return 'second half';
}
elsif ( $value >= 67 and $value <= 100 ) {
return 'third half';
}
# last resort
return 'not found';
}
Output:
# perl test.pl
second half!
A related question is, can you write a sophisticated perl program without conditionals that also minimizes the computational complexity? The answer is, "YES"; and if that's your interest I think your exploration may be added by looking at the functional side of Perl. And there's no better place to look than Higher Order Perl for that.
Final note, even though an series of ifels.. statements will terminate when the condition is found; there is nothing to guarantee how far down the litany of conditionals you'll fall. So in terms of computational complexity, it's always going to add a constant factor equivalent to the worst case scenario of having to check all conditions. | [reply] [d/l] [select] |