Any optimization that needs to be done will have to look at all the combinations, which quickly increase in number. Monks more versed in mathematics will correct me if I'm wrong, but I guess it will be a factorial function.
Borrowing on the technique of "Random improvement" and the "inversion" operator (as discussed in Travelling Salesman) I tried this in the following program:
It improves the result, but rarely finds the optimal solution. Perhaps the size of the data is too small and/or the "unloop factor" is not well chosen. The unloop sub is not optimal either, as it will not swap the first record. Comments and improvements are invited!use strict; my $debug = 0; my @data = <DATA>; my $count = @data; print "Original: $count\n"; # first run which cannot be worse than the whole file my @result = try_me(@data); my $result = @result; print "Run 1: $result\n"; print @result; #we will unloop $unloop_factor records each time and see if we get a b +etter result my $unloop_factor = 3; #we will try this $number_of_runs times my $number_of_runs = 5000; foreach my $number ( 2 .. $number_of_runs + 1 ) { my @new_data = unloop( $unloop_factor, @data ); print "New data:\n @new_data" if $debug; my @new_result = try_me(@new_data); my $new_result = @new_result; print "Run $number: $new_result\n" if $debug; if ( $new_result <= $result ) { print "New result:\n @new_result" if $debug; @data = @new_data; @result = @new_result; $result = $new_result; } ## end if ( $new_result <= $result) } ## end foreach my $number ( 2 .. $number_of_runs... print "\nFinal result is: $result\n @result\n"; sub unloop { my ( $unloop_factor, @data ) = @_; my $length = @data; my $start = int( rand( $length - $unloop_factor ) ); print "\nUnloop after $start\n" if $debug; my @begin = @data[ 0 .. $start ]; my @middle = @data[ $start + 1 .. $start + $unloop_factor ]; my @end = ( @data, @data )[ $start + $unloop_factor + 1 .. $length + - 1 ]; return ( @begin, reverse(@middle), @end ); } ## end sub unloop sub try_me { my @input = @_; my @result; my ( %first, %second, %third ); foreach (@input) { my ( $first, $second, $third, $fourth ) = split ','; push @result, $_ unless exists $first{$first} and exists $second{$second} and exists $third{$third}; $first{$first}++; $second{$second}++; $third{$third}++; } ## end foreach (@input) return @result; } ## end sub try_me __DATA__ A1, B1, C1, first record* A2, B1, C3, second record A3, B1, C2, third record A1, B2, C3, fourth record A2, B2, C2, fifth record* A3, B2, C1, sixth record A1, B3, C2, seventh record A2, B3, C1, eight record A3, B3, C3, nineth record* A1, B2, C3, tenth record
CountZero
A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
In reply to Re^3: Most efficient record selection method?
by CountZero
in thread Most efficient record selection method?
by Kraythorne
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |