in reply to Re: Hash versus chain of elsifs
in thread Hash versus chain of elsifs

Thanks. I'll go with the hash method then and use exists to check it. Does the hash get built each time the function is called or does it persist in some way? I'm guessing the former, from some tests I've tried.

Replies are listed 'Best First'.
Re^3: Hash versus chain of elsifs
by Fletch (Bishop) on Nov 22, 2021 at 10:15 UTC

    Depends on how and where you're building it; as was mentioned give us a sample and someone can comment on specifics. That being said though here's a way to (for example) initialize your hash from a file with one item per list lazily and only one time (unless you explicitly clear it):

    if( exists _get_cache()->{ $candidate } ) { say qq{IT DOES}; } else { say qq{No such luck . . .}; } { ## Block to scope our cache to just these subs my $lookup_cache = undef; sub _reset_cache { $lookup_cache = undef; } sub _get_cache { $lookup_cache //= _load_cache(); } sub _load_cache { ## presuming you've declared file var somewhere above . . . open( my $fh, q{<}, $CACHE_FILE_NAME ) or die qq{Can't open '$CACHE_FILE_NAME': $!\n}; $lookup_cache = {}; while( <$fh> ) { chomp; $lookup_cache->{ $_ } = 1; } close( $fh ); return $lookup_cache; } } ## End of limited scope block.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re^3: Hash versus chain of elsifs
by eyepopslikeamosquito (Archbishop) on Nov 22, 2021 at 09:50 UTC

    You should make it persist. That will have a big impact on performance. If you need further help, show us some sample code and we'll be able to show various ways to make it persist.

      Here is some sample code:

      package JunnkSites 0.03; use parent qw(Exporter); our @EXPORT = qw(KnownJunkSite); sub KnownJunkSite { my ($a) = (@_); my %junksites = ( "bollyinside.com", "www.bollyinside.com", ... "worldtrademarkreview.com", "www.worldtrademarkreview.com", ); if(exists($junksites{$a})) { $a = 1; } else { $a = 0; } return($a); }

      Any other performance and style tips or pointers welcome.

        Some feedback on your posted code:

        • Always post a SSCCE
        • Your hash is not correct (which would have been picked up with a SSCCE).
        • Always Use strict and warnings
        • Don't use $a or $b as your variable name because they have special meanings in Perl.
        • DRY (you have repeated $a unnecessarily in your sample code).

        Anyway, here is a very simple example of how I would go about it.

        use strict; use warnings; # Using block lexical scope to data-hide %junksites { my %junksites = ( 'bollyinside.com' => 1, 'www.bollyinside.com' => 1, 'worldtrademarkreview.com' => 1, 'www.worldtrademarkreview.com' => 1, ); sub KnownJunkSite { my $val = shift; return exists $junksites{$val}; } } for my $v ('bollyinside.com', 'fred', 'www.worldtrademarkreview.com') +{ print "$v: ", KnownJunkSite($v) ? "found\n" : "not found\n"; }

        Running this little test program produces:

        bollyinside.com: found fred: not found www.worldtrademarkreview.com: found

        Update: Should you write a Procedural Module or an OO Module or just use a Hash?

        In your case, if I wrote a module, I'd use OO. See also:

        ... though I'd also consider not writing a module at all, instead just using a hash/hashref directly, as analysed below in my reply to this reply.

        G'day mldvx4,

        I agree with others that a hash is likely to be more efficient than a chain of elsifs. Having said that, as a general rule-of-thumb, you should Benchmark: Perl may have already optimised what you're trying to do (so you'd be both wasting your time and bloating your code); different algorithms may be more or less efficient depending on the data (e.g. number of strings, individual length of strings, total size of data); and so on. Don't guess; benchmark.

        "Any other performance and style tips or pointers welcome."
        • When asked for sample code; provide code that we can run and output that shows it runs correctly. If you can't get your code to produce the desired output, indicate what you expected and show what you actually got (including all error and warning messages verbatim between <code>...</code> tags). I suggest you read "SSCCE".
        • Your package name should probably only contain one 'n', i.e. JunkSites.
        • Always put use strict; and use warnings; at the top of your code.
        • Don't use $a or $b as general variables. They're special. See "$a".
        • Use state, instead of my, to declare persistent variables. Note that state was introduced in Perl v5.10.
        • The code you show for sub KnownJunkSite {...} looks very wrong. See my example code below for what I think is closer to what you're after.

        Example code:

        #!/usr/bin/env perl use 5.010; use strict; use warnings; my @test_sites = qw{x.com y.com www.z.com www.y.com}; check_junk($_) for @test_sites; sub KnownJunkSite { my ($key) = @_; state $is_junksite = { map +($_, 1), qw{ x.com www.x.com z.com www.z.com } }; return exists $is_junksite->{$key} ? 1 : 0; } sub check_junk { my ($key) = @_; say "$key: ", KnownJunkSite($key); }

        Output:

        x.com: 1 y.com: 0 www.z.com: 1 www.y.com: 0

        — Ken

        You can make your %junksites variable persist either by moving it outside the sub (in a block lexical scope) or by making it a state variable.

        For a simple example of these two approaches, see the %rtoa variable at:

        my %junksites = ( "bollyinside.com", "www.bollyinside.com", ... "worldtrademarkreview.com", "www.worldtrademarkreview.com", );

        Note that hash needs key and value pairs, so you're storing only the site names without www as keys; the www. prefixed ones are stored as values. Probably not what you want.

        The fast way how to initialize the keys is

        my %junksites; @junksites{qw{ bollyinside.com www.bollyinside.com ... worldtrademarkreview.com www.worldtrademarkreview.com }} = ();

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]