arturo has asked for the wisdom of the Perl Monks concerning the following question:
The problem: in a (perhaps misguided) attempt to take strain off my DBMS, I have a hash whose keys are labels and whose values are numerical IDs. I want to get from my textual data to the numerical IDs.
That way, Perl does the work instead of the DBMS (the alternative is to look up the label in the DB for each iteration through the loop, which gets run around 3 million times -- I haven't benchmarked, but I don't think caching will help much; besides the system running the perl script has 4gb and it's *much* faster than the DB server)
This is a backend script that parses an Apache access log and inserts the data into the DB.
Anyhow ... the labels are location names, complete with trailing slashes. Following that is a little ID that tells me which virtual host the location lives on. The values, as mentioned above, are the IDs in the database corresponding to each location. Thus, the keys / values look something like this: /foo/-17=> 56 /rastapopulous/woohoo/-19=>67
The data I'm trying to mangle (lines in an Apache logfile) doesn't always cooperate -- the trailing slash is not always present if someone requests the "/foo/" location (e.g. the request string looks like "www.foo.bar.net/foo") so my sub that takes the request string and returns the ID currently looks like this:
sub get_location_id { my ($loc_string, $vhost_id) = @_; # %locations is a global hash, the one that holds all the values return $locations{"$loc_string-$vhost_id"} if defined($locations{"$loc_string-$vhost_id"}; # OK, maybe we didn't find the whole thing, but we still # want to know what the top level of the tree was, # i.e. /foo/bar/ may not be in the lookup hash, but # if /foo/ is we want to log it my ($chopped_loc) = ($loc_string =~ m#^([^/]+/?#); return $locations{"$chopped_loc-$vhost_id"} if defined $locations{"$chopped_loc-$vhost_id"}; # OK, that handles it if the trailing slash is present # in the data ... return $locations{"$chopped_loc/-$vhost_id"} if defined{"$chopped_loc/-$vhost_id"}; # log it to the catch-all "other" location if # we've fallen through this far return $locations{"other-$vhost_id"}; }
I'm *sure* it's inelegant. I wouldn't be surprised to find out it's inefficient. How might I improve it?
I have a large file whicPhilosophy can be made out of anything. Or less -- Jerry A. Fodor
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Optimizing hash lookups
by Fastolfe (Vicar) on Oct 31, 2000 at 01:18 UTC | |
|
Re: Optimizing hash lookups
by arturo (Vicar) on Oct 31, 2000 at 01:21 UTC |