Too Many IDs

The_Dj has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Too Many IDs by kcott (Archbishop) on Jan 09, 2020 at 08:21 UTC
G'day The_Dj, Although not stated, I'm assuming all `id`, `sn`, etc. values are unique. If that's not the case, neither your current solution nor my alternative suggestion will work properly. Instead of recreating the entire hash multiple times, consider just having a single hash with all the data and then simple mappings of `sn` to `id` (extending for future requirements). Here's a quick example: #!/usr/bin/env perl use strict; use warnings; my %dat_by_id = ( 1 => {id=> 1, sn => 'a', more => 'foo'}, 2 => {id=> 2, sn => 'b', more => 'bar'}, ); my %map_sn_to_id = map +($_->{sn} => $_->{id}), values %dat_by_id; print "SN for ID[1]: $dat_by_id{1}{sn}\n"; print "ID for SN[b]: $dat_by_id{$map_sn_to_id{b}}{id}\n"; print "MORE for SN[a]: $dat_by_id{$map_sn_to_id{a}}{more}\n"; # Subsequent requirements, e.g. my %map_more_to_id = map +($_->{more} => $_->{id}), values %dat_by_id; print "ID for MORE[foo]: $dat_by_id{$map_more_to_id{foo}}{id}\n"; print "SN for MORE[bar]: $dat_by_id{$map_more_to_id{bar}}{sn}\n"; [download] Output: `SN for ID[1]: a ID for SN[b]: 2 MORE for SN[a]: foo ID for MORE[foo]: 1 SN for MORE[bar]: b` [download] Having a single data source will reduce the chances of errors and should make maintenance and debugging (if necessary) easier. I see you've used "`map BLOCK LIST`" and I'm aware that's considered a Best Practice; however, "`map EXPR, LIST`" is faster and may make a difference, especially when you're dealing with millions of data elements. Use Benchmark to test. See map for more on these two forms as well as an explanation of the unary plus, "map +(...", I used (if you're unfamiliar with that syntax). I've only shown a barebones technique. For production usage, I'd suggest setting up a series of functions, e.g. `get_id_for_sn($sn)`, instead of having to continually hard-code an equivalent `$dat_by_id{$map_sn_to_id{$sn}}{id}`. — Ken	[reply] [d/l] [select]
Re^2: Too Many IDs by The_Dj (Beadle) on Jan 09, 2020 at 13:49 UTC
Hi, Ken. Yeah, I should have mentioned that both `sn` and `id` are^* unique keys. And thanks for the benchmark pointers. I'll keep this in mind when I get to optimizing. Sadly I tend to loose most of my runtime to DB calls and some 3rd party .exe's, but I will give `map` a close scrutiny. _{* Except one isn't really, but is treated as such for other reasons. My job can be 'fun'}	[reply] [d/l] [select]
Re^3: Too Many IDs by kcott (Archbishop) on Jan 10, 2020 at 01:58 UTC
If parts of your application (DB, EXEs, etc.) are taking seconds or milliseconds to run, spending time optimising `map` to save a few micro- or nanoseconds is unlikely to be worth the effort. Another consideration is whether this is a short-lived application that's run multiple times or a long-lived application that's run with multiple iterations. There are some JIT (just-in-time) possibilities that may be worth consideration. If you end up with a lot of `get_X_for_Y()` subs, and some are only called infrequently, you can create mappings just when they're needed; something like this using state (which requires `v5.10`): `{ my %dat_by_id = ...; sub get_X_for_Y { my ($Y) = @_; state $map_Y_to_id = { map +($_->{Y} => $_->{id}), values %dat_by_id }; return $dat_by_id{$map_Y_to_id->{$Y}}{X}; } sub get_X_for_Z { ... } sub get_V_for_W { ... } ... }` [download] Note that the anonymous block gives `%dat_by_id` lexical scope such that it is only visible to the `sub`s; while the `sub`s themselves are visible to, and accessible from, the entire script. This prevents inadvertent changes to `%dat_by_id` which could introduce bugs which are hard to track down. Also note that the placement of the above code within your script would be important. The assignment to `%dat_by_id` should occur before any calls to `get_?_for_?()` are made. An alternative to this would be to use some combination of `BEGIN`, `INIT`, etc. blocks; I don't know enough about your code to comment further on this; see perlmod: BEGIN, UNITCHECK, CHECK, INIT and END for more about these. There would, no doubt, be other ways to handle this but, as previously stated, I don't know enough about your code to offer further advice. — Ken	[reply] [d/l] [select]
Re^4: Too Many IDs by Anonymous Monk on Jan 13, 2020 at 01:06 UTC
Re: Too Many IDs by choroba (Cardinal) on Jan 09, 2020 at 08:08 UTC
Update: This answer isn't really helpful, sorry. Your solutions stores the same references under the sn's, so it should occupy memory comparable to my solution. If your concern is memory, you can store just the "sn to id" map instead of storing all the data twice. `my %sn_to_id = map { $dat_by_id{$_}{sn} => $_ } keys %dat_by_id;` [download] You then need to nest the references to get the value: `my $sn = 'b'; my $more = $dat_by_id{ $sn_to_id{$sn} }{more};` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: Too Many IDs by LanX (Saint) on Jan 09, 2020 at 11:30 UTC
Where is the benefit to the OP's code, where the values are just the identical hash refs and no flat copies? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply]
Re: Too Many IDs by LanX (Saint) on Jan 09, 2020 at 11:34 UTC
> Is there a better way? Depends, there are certainly cases where it was certainly better to keep the raw table in the DB and you query ID or SN individually via SQL. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply]
Re^2: Too Many IDs by ForgotPasswordAgain (Priest) on Jan 09, 2020 at 13:36 UTC
"Depends" is about as good of an answer as I could think of, too. (Whenever someone uses a comparative like "better", I usually insist on "better, for what?" if I'm going to spend time answering, and start thinking "is this an X-Y question?", etc.) I would say that I don't usually use `selectall_hashref` like that. Often I'd go with `selectall_arrayref($sql, {Slice => {}}, @bind)` so you iterate over the hrefs directly to build the aggregations (hashes). And end up, like you say, splitting things up and doing subqueries. Depends..	[reply] [d/l] [select]
Re^3: Too Many IDs by The_Dj (Beadle) on Jan 09, 2020 at 13:53 UTC
Yep. I will traverse all the data The data lives on another server Pulling it all at once is just faster^* I acutlly have both `selectall_hashref` and `selectall_arrayref($sql, {Slice => {}})` in my code, each where it is best (I believe) _{* I should probably benchmark that}	[reply] [d/l] [select]
Re^4: Too Many IDs by Tanktalus (Canon) on Jan 09, 2020 at 20:42 UTC
Re^5: Too Many IDs by talexb (Chancellor) on Jan 10, 2020 at 18:51 UTC
Re^5: Too Many IDs by Anonymous Monk on Jan 13, 2020 at 01:34 UTC
Some notes below your chosen depth have not been shown here
Re^4: Too Many IDs by LanX (Saint) on Jan 09, 2020 at 16:25 UTC


Just another Perl shrine
	PerlMonks