extracting data from a deep data structure

punkish has asked for the wisdom of the Perl Monks concerning the following question:

I have a DBM::Deep data structure that looks like so...

%db = (
  year => {
    month => {
      day => {
        foo => 1, bar => 1, baz => 1, and so on..., 
      }
    }
  }
);

if year, month, day
  I want 'n' keys from $db{year}{month}{day}
if year, month
  I want 'n' latest keys from the days from $db{year}{month}
if year
  I want 'n' latest keys from the days from $db{year}
else no date defined
  I want 'n' latest keys from %db
[download]

when I say 'latest' I mean, get all the keys from the latest day from the defined date parts. If they are less than 'n', go to the next latest day from that defined date part, and so on. Of course, if they (all the keys from the latest day from the defined date part) are more than 'n', then pick random 'n'.

So, if 2001-11-2, then get all the keys from 2001-11-2. If they are more than 'n', pick 'n' random.

If 2001-11, then get all the keys from 2001-11-(latest day). If they are more than 'n', pick 'n' random. If they are less than 'n', get more keys from 2001-11-(next latest day) till there are 'n'.

If 2001, then get all the keys from 2001-(latest month)-(latest day). And so on.

Now, I have got some code working. Besides it being just brute force (looping over each sub-data-structure), of course it takes longer and longer time, the fewer date parts I provide it. For example, if I don't provide any date, I have to unwind the entire data structure before I extract my keys. I am wondering if data-structure guru monks have any wizard method for tackling such a problem.

Update: Solved the problem. See code at Re^2: extracting data from a deep data structure and please comment.

--

when small people start casting long shadows, it is time to go to bed

Comment on extracting data from a deep data structure Download Code

Replies are listed 'Best First'.
Re: extracting data from a deep data structure by shmem (Chancellor) on Jul 10, 2006 at 04:22 UTC
Now, I have got some code working. Do tell, do tell! You first... --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply]
Re^2: extracting data from a deep data structure by punkish (Priest) on Jul 10, 2006 at 04:35 UTC
>>Now, I have got some code working. >Do tell, do tell! You first... ;-). Actually, I solved my own problem. So, here I present my code. The `last if (scalar(@arr) > $num);` line solves my speed problem greatly. The speed is down from 20 wallclock seconds to 4 wallclock seconds when none of the date parts are provided. I still present the code here for monks to tell me better ways to do the same. my $num = 5; sub get_latest { my ($year, $mon, $day) = @_; my @arr; if ($year && $mon && $day) { my $h = $d_dbd->{$year}->{$mon}->{$day}; @arr = keys(%$h); } elsif ($year && $mon) { my $h = $d_dbd->{$year}->{$mon}; for (sort {$a <=> $b} (values %$h)) { push(@arr, keys(%$_)); last if (scalar(@arr) > $num); } } elsif ($year) { my $h = $d_dbd->{$year}; for my $m (sort {$a <=> $b} (values %$h)) { for (sort {$a <=> $b} (values %$m)) { push(@arr, keys(%$_)); last if (scalar(@arr) > $num); } } } else { for my $y (keys %$d_dbd) { my $h = $d_dbd->{$y}; for my $m (sort {$a <=> $b} (values %$h)) { for (sort {$a <=> $b} (values %$m)) { push(@arr, keys(%$_)); last if (scalar(@arr) > $num); } } } } return extract(\@arr); } sub extract { my ($arr) = @_; my @p; my @arr = @$arr; if (scalar(@arr) > $num) { for (1..$num) { my $rand = int(rand(scalar(@arr))); push(@p, splice(@arr, $rand, 1)); } } else { @p = @arr; } return \@p; } [download] Update: I modified the above code by adding a label 'FOO' to exit when test passes in order to gain more speed. New code is so (another update -- new code presented completely here)... sub get_latest { my ($year, $mon, $day, $num) = @_; my @arr; if ($year && $mon && $day) { my $h = $d_dbd->{$year}->{$mon}->{$day}; @arr = keys(%$h); } elsif ($year && $mon) { my $h = $d_dbd->{$year}->{$mon}; FOO: for my $day (sort {$b <=> $a} (keys %$h)) { my $h = $d_dbd->{$year}->{$mon}->{$day}; push(@arr, keys(%$h)); last FOO if (scalar(@arr) > $num); } } elsif ($year) { my $h = $d_dbd->{$year}; FOO: for my $mon (sort {$b <=> $a} (keys %$h)) { my $h = $d_dbd->{$year}->{$mon}; for my $day (sort {$b <=> $a} (keys %$h)) { my $h = $d_dbd->{$year}->{$mon}->{$day}; push(@arr, keys(%$h)); last FOO if (scalar(@arr) > $num); } } } else { FOO: for my $year (sort {$b <=> $a} (keys %$d_dbd)) { my $h = $d_dbd->{$year}; for my $mon (sort {$b <=> $a} (keys %$h)) { my $h = $d_dbd->{$year}->{$mon}; for my $day (sort {$b <=> $a} (keys %$h)) { my $h = $d_dbd->{$year}->{$mon}->{$day}; push(@arr, keys(%$h)); last FOO if (scalar(@arr) > $num); } } } } return extract(\@arr, $num); } [download] I get the error `Exiting subroutine via last at script.pl line 57. Label not found for "last FOO" at script.pl line 57.` [download] What gives? Update2: Never mind. Moving the labels to the outermost `for .. (sort {$a <=> $b} (keys %$d_dbd)) {` loop in each `if..elsif` block solves the problem. Code now takes `0 wallclock secs ( 0.04 usr + 0.00 sys = 0.04 CPU)` in all cases. Me happy. -- when small people start casting long shadows, it is time to go to bed	[reply] [d/l] [select]