Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm having trouble with some hashes in subroutines. It just doesn't seem "fast" using them like I've read in the books. Here's what I have:
my %gnarf = mysub($val); my $foo = $gnarf{'foo_count'}; my $bar = $gnarf{'bar_count'}; my @gak = @{$gnarf{gak_array}}; my @flub = @{$gnarf{flub_array}}; ### Take in the string $val, produce some ### scalars and two arrays sub mysub { my $val = shift; # do stuff with $val, a string, producing @foos, # @bars, @gaks, and @flubs, and some other simple # scalars here to count the elements in the list my $foos = @foos; # count elements of @foos my $bars = @bars; # ditto for @bars return (foo_count => $foos, bar_count => $bars, gak_array => [sort @gaks], flub_array => [sort @flubs]); }

Aside from renaming my structures, is there a better way to execute the sub, passing in $val, have it return scalars and hashes, and let me access those elements, in a fast, scalable way?

Is this the "Right Way™" to use hashes here?

Replies are listed 'Best First'.
Re: Optimizing the use of hashes in subroutines
by sauoq (Abbot) on May 25, 2003 at 18:30 UTC

    Your example is fine. I would probably return a reference to a hash, but that shouldn't make a great deal of difference in your case. You should, however, try to eliminate the pointless copying of data after you fill %gnarf.

    Why do @gak = @{$gnarf{gak_array}}; when the data is already available via $gnarf{gak_array}? It looks to me like you just need to get a little more comfortable with using references. If you are trying to cut down on some typing, copy the reference rather than the whole array; use my $gak = $gnarf{gak_array}; instead.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: Optimizing the use of hashes in subroutines
by Limbic~Region (Chancellor) on May 25, 2003 at 18:32 UTC
    Anonymous Monk,
    Unfortunately, you didn't provide the stuff that manipulates $val - part of the "slowness" could be there. When you want to evaluate how a tweak has impacted performance - look into Benchmark. The thing to remember here is to go through many iterations to remove "flukes", vary your data as code behaves differently based off input, and try to test on a system at rest so it won't be influenced by other running programs. There is also Devel::DProf.

    You also haven't indicated what $val is. In your comment, you say that it generates 4 arrays. If it is a very large scalar, it may be useful to pass the argument as a reference (see References Tutorial & References quick reference) instead of copying the whole thing which is slow.

    Additionally, instead of returning the entire hash, you may just want to return a reference to the hash. That would change the my %hash = build_hash($val); to something like my $hash_ref = build_hash($val); You could also not use a return value at all, but build the hash inside the subroutine:

    #!/usr/bin/perl -w use strict; my %hash; my $val = "foo bar"; build_hash(\%hash,$val); sub build_hash { my ($hash_ref, $val) = @_; my @vals = split /\s+/ , $val; foreach (@vals) { $hash_ref->{$_} = 1; } }

    I am sure there is more optimization that could be done, but this should be a start. The other answers should become obvious after reading the references tutorials I provided.

    Cheers - L~R

Re: Optimizing the use of hashes in subroutines
by BrowserUk (Patriarch) on May 25, 2003 at 19:55 UTC

    Personally, I can see very little point using a hash just to name the returns from a subroutine. What are you going to do with the names?

    In the example you gave, all your doing is using them to unpack the hash into seperately named entities anyway, so don't. Pass back references for the arrays (or hashes) and assign them directly to the named references in the calling code.

    The result is quicker, cleaner and more readable (IMO).

    ### Take in the string $val, produce some ### scalars and two arrays sub mysub { my $val = shift; my @foos = 0 .. rand(50); my @bars = 0 .. rand(50); return scalar @foos, # How many foos scalar @bars, # How many bars [sort @foos], # Assorted foos (no chilli) [sort @bars]; # Ditto bars (hic!) } my $val = 'Irrelevant'; my ($foo, $bar, $gak, $flub) = mysub($val); print "I have $foo foos: @{$gak}\n"; print "I have $bar bars: @{$flub}\n"; # That's "sheep" and "pub" when +I'm sober. print "The first foo is $gak->[0]\n"; print "The last bar is $flub->[-1]\n"; __DATA__ D:\Perl\test>junk I have 24 foos: 0 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 3 4 5 +6 7 8 9 I have 29 bars: 0 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 23 24 25 +26 27 28 3 4 5 6 7 8 9 The first foo is 0 The last bar is 9

    A numeric sort would be more sensible for numeric data, but its only an example!


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
Re: Optimizing the use of hashes in subroutines
by hacker (Priest) on May 26, 2003 at 14:58 UTC
    I'm working on something very similar to what you described here. With the help of Limbic~Region, bart, castaway, and others, I've come up with something that seems to work for me. It looks like this (for fetching URLs):
    my %pi; my ($status, $content, $type, $base, $length) = get_page(\$site, \%pi); print $pi{'length'}, " bytes\n"; #################################################### sub get_page { my ($site_ref, $pi) = @_; $browser = LWP::UserAgent->new() unless $browser; my $response = $browser->get($$site_ref); my $status = $response->status_line; my $content = $response->content; my $type = $response->content_type; my $base = $response->base; my $length = commify(length($content)); if (!$response->is_success) { die "\nError: $status ($site)\n"; } @{$pi}{qw(status content type base length)} = ($status, $content, $type, $base, $length); }

    It may not be the most-efficient, because I'm still trying to wrap my head around references and anonymous references to subs, but it does work for me.

    Comments or improvements are welcome, as long as they provide the fishing pole and a lure, not a basket of fish..

      If you'll excuse me saying this (which you won't) but somebody is taking the p*** outta you, if this is what they showed you to do (which they probably didn't).

      You call a sub, passing in a hashref \%pi.

      In the sub, you assign a bunch of values to local disrete scalars.

      You then assign these scalars to a hash, using a hash slice.

      You then extract the values from the hash (via the hash slice), and return them as a list.

      The calling code then assign the list to another bunch of discrete scalars which you then ignore and access the values via the hash?

      This isn't just inefficient, it's {expletive}.

      And what do you achieve by passing \$site, just so you can do $$site_ref? Your passing one scalar instead of another and dereferencing it twice. Totally useless use of a references.

      Here are four different ways of doing what your doing, each of which is simpler, clearer and more efficient. Try and understand how they each work, and then pick one.

      No hash

      my ($status, $content, $type, $base, $length) = get_page($site); print $length, " bytes\n"; #################################################### sub get_page { my ($site, $pi) = @_; $browser = LWP::UserAgent->new() unless $browser; my $response = $browser->get($site); my $status = $response->status_line; my $content = $response->content; my $type = $response->content_type; my $base = $response->base; my $length = commify(length($content)); if (!$response->is_success) { die "\nError: $status ($site)\n"; } return $status, $content, $type, $base, $length; }

      Assign to a hash slice in the calling code from the returned the list.

      my %pi; @%pi{qw(status content type base length)) = get_page($site); print $length, " bytes\n"; #################################################### sub get_page { my ($site) = @_; $browser = LWP::UserAgent->new() unless $browser; my $response = $browser->get($site); if (!$response->is_success) { die "\nError: $status ($site)\n"; } return $response->status_line, $response->content, $response->content_type, $response->base, commify(length($content)); }

      Assign direct to the hash. No lists.

      my %pi; get_page($site, \%pi); print $pi{'length'}, " bytes\n"; #################################################### sub get_page { my ($site, $pi) = @_; $browser = LWP::UserAgent->new() unless $browser; my $response = $browser->get($site); $pi->{status} = $response->status_line; $pi->{content} = $response->content; $pi->{type} = $response->content_type; $pi->{base} = $response->base; $pi->{length} = commify(length($pi->{content})); if (!$response->is_success) { die "\nError: $status ($site)\n"; } }

      Assign to hash slice direct. Don't return a list

      my %pi; get_page($site, \%pi); print $pi{'length'}, " bytes\n"; #################################################### sub get_page { my ($site, $pi) = @_; $browser = LWP::UserAgent->new() unless $browser; my $response = $browser->get($site); if (!$response->is_success) { die "\nError: $status ($site)\n"; } my $content = $response->content; @{$pi}{qw(status content type base length)} = ( $response->status_line; $content; $response->content_type; $response->base; commify(length($content)); ); return; }