Don't-Repeat-Myself code for improvement

Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Don't-Repeat-Myself code for improvement by merlyn (Sage) on Feb 27, 2007 at 22:09 UTC
The technique I illustrate in a magazine article from 10 years ago might help, about weighted random deals. -- Randal L. Schwartz, Perl hacker	[reply]
Re: Don't-Repeat-Myself code for improvement by rhesa (Vicar) on Feb 27, 2007 at 22:11 UTC
I would probably have reached for List::MRU or Tie::Cache::LRU or something like that.	[reply]
Re: Don't-Repeat-Myself code for improvement by dragonchild (Archbishop) on Feb 28, 2007 at 01:53 UTC
You have two orthogonal problems. How do I pick something randomly from a list if-and-only-if I haven't used it before? How do I preserve state between two runs of the same program? The answer to the second is easy - use a DBM. The best DBM for this kind of task would be DBM::Deep because it allows you to have a Perl datastructure be backed by disk instead of RAM. All the standard Perl datastructure manipulators for hashes and arrays exist and are used just like normal. The answer to the first is to maintain a list of unchosen items. Once an item is chosen, then you remove it from the list of choosable items and move on. Assuming that there aren't any duplicates, this is best implemented as a set which is usually implemented as a hash in Perl. So, something like: `my %set = ( # Populate this somehow. ); my @keys = keys %set; my $chosen_key = $keys[ rand @keys ]; my $chosen_value = delete $set{ $chosen_key };` [download] Now, if your list of chosen items isn't discrete, then maintain a list of previously chosen items and disregard a choice until it doesn't exist. This is also implemented with a hash, but instead of calling it %set, one generally calls it %seen. Implementation left as an exercise for the reader. Now, the hard part - doing the DBM bit. Let's say we have a discrete set as per the prior example. Go ahead and seed the DBM file as so: `my %set; tie %set, 'DBM::Deep' => { file => 'my_file.db' }; %set = ( # Populate as desired, nesting arrays and hashes as appropriate. );` [download] Now, let's modify our prior example. With a DBM, it looks like: `my %set; tie %set, 'DBM::Deep' => { file => 'my_file.db' }; my @keys = keys %set; my $chosen_key = $keys[ rand @keys ]; my $chosen_value = delete $set{ $chosen_key };` [download] Alternately, you can choose to use the OO interface which would look like: `my $set = DBM::Deep->new({ file => 'my_file.db', }); my @keys = keys %{ $set }; my $chosen_key = $keys[ rand @keys ]; my $chosen_value = delete $set->{ $chosen_key };` [download] Note the difference between hash notation and hashref notation. Oh - and if you need them, DBM::Deep also supports transactions and synchronous access. My criteria for good software: Does it work? Can someone else come in, make a change, and be reasonably certain no bugs were introduced?	[reply] [d/l] [select]
Re: Don't-Repeat-Myself code for improvement by BrowserUk (Patriarch) on Feb 27, 2007 at 22:45 UTC
Since you seem to be working with a fixed and finite number of ids, it would be easier to shuffle them once and write them to a file. Then each time you need one, take the last one in the file and truncate it so it cannot be used again. Once you get to an empty file, shuffle them again and re-write the file. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^2: Don't-Repeat-Myself code for improvement (foo foo) by tye (Sage) on Feb 27, 2007 at 23:11 UTC
If item "foo" gets written as the first item in the file at iteration $N and then as the last item in the file at iteration $N+1, then "foo" will be returned twice in a row, which violates 'If I chose "foo" recently, I shouldn't use it again.' My method of doing this is to permute the list and store it (probably in a database). When I need a new item, I pick a random number ($R) between 0 and $N where $N is a fraction of the size of the list and pick the $R'th least recently used item and mark it as the most recently used item. One way to record which was most recently used is to put a timestamp in the records in the DB. Then picking the $R'th least recently used item can be done with a simple SQL query using "ORDER BY lastused LIMIT $R,1" (or the equivalent supported by your particular DB) and marking that item "most recently used" is a simple update to the timestamp. The initial permutation of the list can be done by assigning each timestamp to a random time of day "yesterday". - tye	[reply]
Re^3: Don't-Repeat-Myself code for improvement (foo foo) by BrowserUk (Patriarch) on Feb 28, 2007 at 00:16 UTC
That's a good point, and an interesting solution provided that the number of IDs is small, but horribly expensive if the number of Ids is much more than the 8 shown. 40 thousand entries for the 8 IDs shown is managable, but if the number rises to 15 then wouldn't you need something like 70 terabytes to store the permutations? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re^4: Don't-Repeat-Myself code for improvement (foo foo) by tye (Sage) on Feb 28, 2007 at 01:21 UTC
Re^2: Don't-Repeat-Myself code for improvement by Cody Pendant (Prior) on Feb 28, 2007 at 01:44 UTC
I'm not actually working with a finite, fixed number at all. I should have said. I'm using ids which come from an external source over which I have no control. Also, there's no requirement to use them all up before repeating. Just that I shouldn't re-use the same one twice or more in a row. ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss') =~y~b-v~a-z~s; print	[reply]
Re^3: Don't-Repeat-Myself code for improvement by ruzam (Curate) on Feb 28, 2007 at 17:41 UTC
Simplify your life. "there's no requirement to use them all up before repeating. Just that I shouldn't re-use the same one twice or more in a row" So you really don't need to keep track of anything more than the last one used. I would use a simple text file with a single line containing the last value used. Read the last value from the file when you start, randomly select a values from your list until you get a 'new' value that doesn't match the previous (hopefully on the first try), then write the new value back to the file, replacing the previous one. To do anything more would simply be a case of swating a mosquito with a with a lazer guided cruise missle (IMHO). Of course, this solution doesn't allow for much in the way of future enhancement. `use strict; use warnings; my @ids = qw(a b c d e f g h); my $chosen_id = ''; my $last_id = ''; if (open (my $fh, 'recent')) { $last_id = <$fh>; close ($fh); } while (1) { $chosen_id = @ids[ int( rand(@ids) ) ]; print "chosen: $chosen_id\n"; last if $chosen_id ne $last_id; print "problem: $chosen_id is on the recent list, choose again\n"; } # replace the file open (my $fh, '>', 'recent') or die "failed to open recent for write: $!\n"; print $fh $chosen_id; close ($fh);` [download]	[reply] [d/l]
Re: Don't-Repeat-Myself code for improvement (low) by tye (Sage) on Feb 28, 2007 at 08:19 UTC
Now that you've replied and shown that most of us were solving the wrong problem... You can avoid having to use Perl's sort if you use a DBM flavor that allows you some control over how things are indexed. The Berkeley flavor allows that. But I leave the details of that for someone more familiar with Berkeley DBM (or willing to RTFM) since I don't recall the specifics. Since I don't think DBM offers concurrent write access, I'd probably just go "old school" and implement a simple circular buffer in a file and flock it, but I doubt you'd find this an improvement. (: use Fcntl ':flock'; sub wasRecentlyUsed { my( $file, $id, $idlen, $count )= @_; $id= pack "a$idlen", $id; flock( $file, LOCK_EX() ) or die "Can't flock: $!"; seek( $file, 0, 0 ) or die "Can't seek: $!"; my $nextPos= _nextPos( $file, $id, $count ); if( $nextPos ) { seek( $file, 0, 0 ) or die "Can't seek: $!"; print $file pack "S", $nextPos or die "Can't print: $!"; } flock( $file, LOCK_UN() ) or die "Can't unflock: $!"; return ! $nextPos; } sub _nextPos { my( $file, $id, $count )= @_; local( $/ )= \4; my $head= <$file>; if( ! $head ) { seek( $file, 0, 0 ) or die "Can't seek: $!"; $head= pack "SS", 0, $count; print $file $head, $id or die "Can't print: $!"; return tell $file; } my $first= tell $file; ( my $next, $count )= unpack "SS", $head; $/= \ length $id; while( <$file> ) { return 0 if $id eq $_; $count--; } seek( $file, $next, 0 ) or die "Can't seek: $!"; print $file $id or die "Can't print: $!"; return $count < 1 ? $first : tell $file; } [download] - tye	[reply] [d/l]
Re: Don't-Repeat-Myself code for improvement by kyle (Abbot) on Feb 27, 2007 at 22:21 UTC
my @ids = qw(a b c d e f g h); my $chosen_id = ''; # Flat file! # each line is a time and an id separated by a space open my $recent_fh, '<', 'recent' or die "Can't read recent: $!"; my @recent_lines = <$recent_fh>; close $recent_fh or die "Can't close recent: $!"; chomp @recent_lines; my %recent_ids = map { split } @recent_lines; # remove the recent ids from the id list my @possible_ids = grep { ! $recent_ids{$_} } @ids; $chosen_id = $possible_ids[rand @possible_ids]; # process it # remove the oldest, as before open $recent_fh, '>', 'recent' or die "Can't write recent: $!"; print $recent_fh map { "$_ " . $recent_ids{$_} . "\n" } keys %recent_ids; close $recent_fh or die "Can't close recent: $!"; [download] Note there's the possibility of clobbering the recent file if it gets killed while writing it. Also, there's no provision for multiple instances writing to the same file (that is, there's no file locking).	[reply] [d/l]
Re: Don't-Repeat-Myself code for improvement by grinder (Bishop) on Feb 27, 2007 at 22:54 UTC
If the IDs are fixed and known in advance, you could take the set of IDs, shuffle them, and print them to a file. Then in the cron job, you open the file, take the first one, and rewrite the the file (excluding the first one), and then deal with the ID. This gives you a guarantee that all IDs will be visited before any are repeated. Eventually you'll clean the file out completely. At that point, you generate a new shuffled list of IDs, print them to a file and begin all over again. This has the advantage that you can edit the seed file as needed in order to test out certain IDs explicitly, without having to wait for randomness to deal you the right cards. • another intruder with the mooring in the heart of the Perl	[reply]
Re^2: Don't-Repeat-Myself code for improvement (REsort) by tye (Sage) on Feb 27, 2007 at 23:27 UTC
If it is important to cycle through all IDs, then this meets that requirement. But it can easily result in the same ID being used twice in a row (as I noted in my other reply to BrowserUk's proposal of nearly the same method). If this "use all IDs before repeating" is an important criteria, then I'd modify this proposal to include the offset to the next item in said file. Then you rewrite just the offset when you use an item. Then the trick is shuffling the items after they have all been used such that you don't get any of the recently used ones being near the top of the list. One way to do that would be to assign the items values of 1..$N ($N is the number of items) and then add¹ rand($N/$F) to each value (where $F controls how random things are vs the minimum distance between repeats) and then sort the items based on the assigned values. For example: `my( @items )= <FILE>; pop @items; # Remove the "offset" chomp @items; my %value; @value{ @items }= 1 .. @items; $_ += rand(@items/3) for @value{ keys %value }; @items= sort { $value{$a} <=> $value{$b} } @items; print NEWFILE join "\n", @items, 0, ''; # move NEWFILE to replace FILE` [download] - tye ¹ Updated to add missing word.	[reply] [d/l]
Re: Don't-Repeat-Myself code for improvement by andye (Curate) on Feb 28, 2007 at 12:46 UTC
Hi Cody, People have come up with a bunch of interesting suggestions which probably solve your problem. However, in the spirit of TMTOWTDI, here's a lazy-programmer's solution: Store the most recent ones you've done in a hash, and save using Storable. Clear the hash every n runs, store the count of runs-since-last-clear in a scalar, and save it using Storable. In each run: `do {pick a number} until (! exists $hash{this number})` Of course, you can only have one of these running at one time, or you'll shoot yourself in the foot. hth! andye	[reply] [d/l]
Re: Don't-Repeat-Myself code for improvement by Moron (Curate) on Feb 28, 2007 at 20:34 UTC
how about just filter out the recent ids with grep before doing the rand(), e.g. `# ... my @not_recent = grep !defined( $recent_ids{ $_ } ), @ids; my $idno; { use integer; $idno = (1+$#not_recent) * rand(); } # ...` [download] or alternatively, sort the items by $somehash \|\|= -M item into the array and just shift off the first three before doing the rand(). -M Free your mind	[reply] [d/l]