Hash versus chain of elsifs

mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Hash versus chain of elsifs by dsheroh (Monsignor) on Nov 22, 2021 at 08:43 UTC
Between the two approaches you mentioned, I would be extremely surprised if the hash did not turn out to be more efficient, as it doesn't have to step through a list of thousands of `elsif`s. It also has the advantage of taking constant time regardless of whether the string is found or not, rather than being very quick if the string is at the start of the `elsif` chain and being slower for a non-matching string, which would have to individually check every item in the list before it can be known that the string isn't there. More importantly, though, the hash approach is much more readable than an endless list of `elsif`s, and it also opens up the possibility of storing your list of items to match against in a config file or a database (since a hash's contents are just data, not code) which will make for easier long-term maintenance. BTW, the syntax you probably want for checking whether a hash key exists is `exists($hash{$string})`, not `defined` (which will return false if the key is there but has no value).	[reply] [d/l] [select]
Re^2: Hash versus chain of elsifs by mldvx4 (Hermit) on Nov 22, 2021 at 09:34 UTC
Thanks. I'll go with the hash method then and use `exists` to check it. Does the hash get built each time the function is called or does it persist in some way? I'm guessing the former, from some tests I've tried.	[reply] [d/l]
Re^3: Hash versus chain of elsifs by Fletch (Bishop) on Nov 22, 2021 at 10:15 UTC
Depends on how and where you're building it; as was mentioned give us a sample and someone can comment on specifics. That being said though here's a way to (for example) initialize your hash from a file with one item per list lazily and only one time (unless you explicitly clear it): if( exists _get_cache()->{ $candidate } ) { say qq{IT DOES}; } else { say qq{No such luck . . .}; } { ## Block to scope our cache to just these subs my $lookup_cache = undef; sub _reset_cache { $lookup_cache = undef; } sub _get_cache { $lookup_cache //= _load_cache(); } sub _load_cache { ## presuming you've declared file var somewhere above . . . open( my $fh, q{<}, $CACHE_FILE_NAME ) or die qq{Can't open '$CACHE_FILE_NAME': $!\n}; $lookup_cache = {}; while( <$fh> ) { chomp; $lookup_cache->{ $_ } = 1; } close( $fh ); return $lookup_cache; } } ## End of limited scope block. [download] The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l]
Re^3: Hash versus chain of elsifs by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 09:50 UTC
You should make it persist. That will have a big impact on performance. If you need further help, show us some sample code and we'll be able to show various ways to make it persist.	[reply]
Re^4: Hash versus chain of elsifs by mldvx4 (Hermit) on Nov 22, 2021 at 10:24 UTC
Re^5: Hash versus chain of elsifs by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 11:22 UTC
Some notes below your chosen depth have not been shown here
Re^5: Hash versus chain of elsifs by kcott (Archbishop) on Nov 22, 2021 at 11:45 UTC
Re^5: Hash versus chain of elsifs by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 10:39 UTC
Some notes below your chosen depth have not been shown here
Re^5: Hash versus chain of elsifs by choroba (Cardinal) on Nov 22, 2021 at 12:33 UTC
Re: Hash versus chain of elsifs by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 09:32 UTC
Just use a hash is my routine advice - both for clarity and performance. See also Perl Best Practices (Chapter 6) (O'Reilly) (or Perl Best Practices (squirrel pdf)): 6.16 Value Switches - Use table look-up in preference to cascaded equality tests (item 78) If performance is critical, you could always benchmark different approaches - see Fastest way to lookup a point in a set for a simple example of this. Oh, and don't use Switch! :)	[reply]
Re^2: Hash versus chain of elsifs by mldvx4 (Hermit) on Nov 22, 2021 at 11:37 UTC
The squirrel PDF of best practices you point to is quite interesting. Some I already practice, some are new, others I need more clarification on. Is there a page with more detail on some of the enumerated items, maybe with simple examples for some of them?	[reply]
Re^3: Hash versus chain of elsifs by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 12:03 UTC
You're meant to buy the book Perl Best Practices from O'Reilly ... though you could try clicking on "Start your free trial". It's considered unethical to search the internet for free pirated PDFs. :) TheDamian is a brilliant writer and it's a fantastic book IMHO - where it was described by Yanick as "worth it's weight in depleted uranium bars, and twice as entertaining".	[reply]
Re^4: Hash versus chain of elsifs by mldvx4 (Hermit) on Nov 22, 2021 at 13:43 UTC
Re: Hash versus chain of elsifs by kikuchiyo (Hermit) on Nov 22, 2021 at 09:28 UTC
Is the set of strings fixed and known in advance? If yes, look up minimal perfect hashing. One has to be generated specifically for your set of strings, and the generation part is usually expensive, but once it's done it can tell whether an input string is in the set with just one hash lookup and one string comparison. Compilers and similar programs use this technique when they want to match tokens from the input stream against a small, known set of keywords. I don't know if it's worth the effort, though, if you're working in Perl - just use a regular hash and be done with it. You may want to look at the problem again when you're optimizing your program, but you'll likely find that hash access is not your main bottleneck.	[reply]
Re: Hash versus chain of elsifs by LanX (Saint) on Nov 22, 2021 at 10:32 UTC
The only thing in Perl which might be able to rival a hash look-up is a or-regex using the ~~Tree�~~ Trie optimization, especially if you have repeated patterns. `$in =~ m/^($STR1\|$STR2\|$STR3\|...)$/` You should provide more details, for a definitive answer. But a trie will always beat `if-else` with `eq` comparisons. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} updates �) that's not the name... how was it called again... ah ... s/Tree/Trie/ > and does the "best" approach vary by scale? yes. see `${^RE_TRIE_MAXBUF}`	[reply] [d/l] [select]
Re^2: Hash versus chain of elsifs by dsheroh (Monsignor) on Nov 23, 2021 at 08:32 UTC
On the regex suggestion, once upon a time, I did a project which used Regexp::Assemble to take a long list of rexeges and combine them all into a single, monstrously unreadable, regex that would tell me in one shot whether any matched. I seem to recall hearing at some point that similar functionality had been added to the Perl core, but don't recall details. In any case, if you're looking for fixed strings rather than regex patterns, substring matches, etc., I would still expect hashes to be faster, but something based on Regexp::Assemble or a similar technique could be close enough to warrant a benchmark. And you might still want to go that route even if it's a little slower in order to gain the extra flexibility in how you match.	[reply]
Re^3: Hash versus chain of elsifs by LanX (Saint) on Nov 23, 2021 at 11:42 UTC
> something based on Regexp::Assemble or a similar technique Trie optimization is supported since v5.10. See also Re^2: Looking for a cleaner regex ( trie since 5.10 ! ) for more. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re: Hash versus chain of elsifs by perlfan (Parson) on Nov 25, 2021 at 14:42 UTC
In terms of complexity, a hash key look up is always the most efficient being O(1). The drawback, as has been mentioned, is that this requires a key. If you're dealing with a known set of strings, this is as efficient as possible. So to be clear, if you're going to do a cascade of `if ( $str eq 'somestring' )...` then use a hash look up. If dealing with a set of conditions (e.g., a range of values), then you need to use a conditional somewhere. The question becomes, how to compute a static hash key from a value (be it a range, regex, mathematical function, etc). E.g., use strict; use warnings; my $value_buckets = { 'first half' => sub { print qq{first half!\n} }, 'second half' => sub { print qq{second half!\n} }, 'third half' => sub { print qq{third half!\n} }, 'not found' => sub { print qq{not found!\n} }, }; my $key = compute_key(42); $value_buckets->{$key}->(); # just an example, could be anything to derive a hash # key from $value sub compute_key { my $value = shift; if ( $value > 0 and $value < 33 ) { return 'first half'; } elsif ( $value >= 33 and $value <= 66 ) { return 'second half'; } elsif ( $value >= 67 and $value <= 100 ) { return 'third half'; } # last resort return 'not found'; } [download] Output: `# perl test.pl second half!` [download] A related question is, can you write a sophisticated perl program without conditionals that also minimizes the computational complexity? The answer is, "YES"; and if that's your interest I think your exploration may be added by looking at the functional side of Perl. And there's no better place to look than Higher Order Perl for that. Final note, even though an series of `ifels..` statements will terminate when the condition is found; there is nothing to guarantee how far down the litany of conditionals you'll fall. So in terms of computational complexity, it's always going to add a constant factor equivalent to the worst case scenario of having to check all conditions.	[reply] [d/l] [select]

updates