I suppose I have more of a challenge than a question. Feel free to tell me this isn't the appropriate place and no one has the time but if you do, here is my problem:
I have a tab-delimited table with 11 columns and approximately 110,000 rows. It has column headings and the first column is merely a count of the rows (1, 2, 3, 4, 5 etc.). Each entry in the table is either a 1 or a 0. I need to randomly select an entry from columns 2-11, sum up their values and record the sum (will be a number between 0 and 10). I need to do this until all values in the table are gone and no values are used more than once per table. Aaaand here's the kicker: I need to do this 1,000,000 times (ie, repeat for 1,000,000 tables).
Example table:
I have code that will successfully complete this task once in about 1 second, but that would mean I would have to run the code for about 11 days to complete 1,000,000 iterations. Not going to happen. 24 hours is doable but multiple days is not. I have two working programs that take about the same amount of time. Both start by saving all variables in each column to separate arrays (excluding headers). One program then shuffles each array and sums up all the nth-numbered values in the array. The other one uses the following subroutine to select values from each array until the arrays are empty:Head1 Head2 Head3 Head4 Head5 Head6 Head7 Head8 Head9 Head10 + Head11 1 0 1 1 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 1 0 0 3 1 0 0 0 1 0 0 0 1 0
Do you have any suggestions for how to make this subroutine faster? Do you have any suggestions on how to tackle this problem in general; I'm willing to start over! I have benchmarked both programs to try and find the time-culprit but each step individually takes 0 time. Iterating through the whole table takes a second. (Also, feel free to tell me that this is simply a ridiculous amount of values and it is just going to take a long time to complete without a super-computing cluster. This will make me feel better, if not solve my problem).sub find_and_remove () { # pulls the referenced array into the subroutine and saves it in v +ariable $reference $reference = $_[0]; # selects a random number from array size. $random_number = int(rand(@col2c)); # saves that-numbered variable from the array as $nowvalue $nowvalue = @$reference[$random_number]; # removes the variable from the array so it can't be selected agai +n splice (@$reference, $random_number, 1); # returns the selected value return $nowvalue; }
UPDATE: Thank you so much for your replies! I'm relatively new to perl and still have a busy afternoon ahead of me, so it may take me some time to try these new approaches/see if they will work. Again, thank you for your speedy responses!
In reply to Table shuffling challenge by glow_gene
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |