I suppose I have more of a challenge than a question. Feel free to tell me this isn't the appropriate place and no one has the time but if you do, here is my problem:

I have a tab-delimited table with 11 columns and approximately 110,000 rows. It has column headings and the first column is merely a count of the rows (1, 2, 3, 4, 5 etc.). Each entry in the table is either a 1 or a 0. I need to randomly select an entry from columns 2-11, sum up their values and record the sum (will be a number between 0 and 10). I need to do this until all values in the table are gone and no values are used more than once per table. Aaaand here's the kicker: I need to do this 1,000,000 times (ie, repeat for 1,000,000 tables).

Example table:

Head1 Head2 Head3 Head4 Head5 Head6 Head7 Head8 Head9 Head10 + Head11 1 0 1 1 0 0 0 0 0 0 1 2 0 0 0 0 0 0 0 1 0 0 3 1 0 0 0 1 0 0 0 1 0
I have code that will successfully complete this task once in about 1 second, but that would mean I would have to run the code for about 11 days to complete 1,000,000 iterations. Not going to happen. 24 hours is doable but multiple days is not. I have two working programs that take about the same amount of time. Both start by saving all variables in each column to separate arrays (excluding headers). One program then shuffles each array and sums up all the nth-numbered values in the array. The other one uses the following subroutine to select values from each array until the arrays are empty:
sub find_and_remove () { # pulls the referenced array into the subroutine and saves it in v +ariable $reference $reference = $_[0]; # selects a random number from array size. $random_number = int(rand(@col2c)); # saves that-numbered variable from the array as $nowvalue $nowvalue = @$reference[$random_number]; # removes the variable from the array so it can't be selected agai +n splice (@$reference, $random_number, 1); # returns the selected value return $nowvalue; }
Do you have any suggestions for how to make this subroutine faster? Do you have any suggestions on how to tackle this problem in general; I'm willing to start over! I have benchmarked both programs to try and find the time-culprit but each step individually takes 0 time. Iterating through the whole table takes a second. (Also, feel free to tell me that this is simply a ridiculous amount of values and it is just going to take a long time to complete without a super-computing cluster. This will make me feel better, if not solve my problem).

UPDATE: Thank you so much for your replies! I'm relatively new to perl and still have a busy afternoon ahead of me, so it may take me some time to try these new approaches/see if they will work. Again, thank you for your speedy responses!


In reply to Table shuffling challenge by glow_gene

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.