Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Chronologically sort a DBM DB into an array?

by Zed_Lopez (Chaplain)
on Oct 06, 2004 at 06:02 UTC ( [id://396886]=note: print w/replies, xml ) Need Help??


in reply to Chronologically sort a DBM DB into an array?

I talked to Spidy in the Chatterbox to find out what he wanted here.

Here's the situation: he has a dbm file whose keys are the strings returned by localtime and whose values are posts. He wants to find and store the two most recent posts.

Spidy, people were confused because we didn't have the context for "sort the hash so that it can be put into an array". "Sorting a hash", alone, doesn't mean anything -- we didn't know whether you meant sorting the hash's keys, or its values, or by what criteria the sort would occur. When you referred to "putting it into an array", we didn't know whether that meant putting the hash's keys in the array, or its values, or a string joining them together, or an arrayref with both the key and value (all of which are things one might want to do.)

Now that I think I understand, here's one way to do it:

use Date::Calc qw(Mktime Decode_Month); use constant { TIME => 0, POST => 1 }; use constant num_posts => 2; my @recent; dbmopen(my %NEWS, "news", 0644) #opening the db or die("Error opening news database: $!"); while (my ($localtime, $post) = each %NEWS) { my ($weekday, $mon, $day, $hour_min_sec, $year) = split /\s+/, $loca +ltime; my $time = Mktime($year, Decode_Month($mon), $day, split /:/, $hour_ +min_sec); if (@recent < num_posts) { @recent = sort { $a->[TIME] <=> $b->[TIME] } @recent, [$time, $pos +t]; } elsif ($time > $recent[0]->[TIME]) { shift @recent; @recent = sort { $a->[TIME] <=> $b->[TIME] } @recent, [$time, $pos +t]; } }

Now the most recent post is in $recent[1]->[POST] (or $recent[-1]->[POST]) and the next most recent post is in $recent[0]->[POST]. You can make an array of just the posts with:

my @posts = map $_->[POST], @recent;

Here was my thinking:

I want to use each to iterate through the dbm one item at a time, storing only the items I'm interested in, because the file might be very large.

I'll have to convert the localtime output to a format I can readily compare. Converting it to Perl's time format, seconds since the epoch, would be one way. Date::Calc probably has something that can do the conversion.

I'll need to save both the time and the post as I work through the file -- the time so I can continue to compare it to other values from the file as I work, and the post because that's the info I want in the end. I can use an arrayref like [time, post] to store them together, and put these arrayrefs in an array.

I'll keep that array sorted by the time values, so I can easily tell that a given new [time, post] pair has a time < the time in the first array entry, so I know I'm not interested in it and can move on.

I want to be able to easily change how many posts I get, in case I change my mind later about only wanting two. I'll call this value num_posts.

As I first read the hash I'll just be loading the array with every entry I see, keeping it sorted as I go, until I have num_posts entries,

After the number of entries I've stored == num_posts, I know I can ignore any new [time, post] pairs whose time <= the time in the first array entry. For new [time, post] pairs where time > the time in the first array entry, I need to discard the first entry, add the new [time, post] pair, and sort the array.

There's a more efficient way to insert a new entry into a sorted list than to just add it and sort the array with Perl's sort, but I expect num_posts to remain small, so I don't think it's worth it to bother pursuing that more efficient way.

(I didn't really verbalize all of it to myself in exactly that way, but it's as close as I can come to reconstructing my thoughts in a manner I can share.)

BTW, stringified localtime isn't very convenient to work with. If you have the option of using something else, consider it. You could just use seconds since the epoch -- fits in one integer (not much advantage here where you'll be using it as a string), easy to compare (numerically), requires conversion to be human readable. Or you could use SQL's date format, YYYY-MM-DD HH:mm:SS -- a string of 19 characters, easy to compare (as a string,) human readable.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://396886]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-04-19 19:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found