in reply to Re^2: regular expessions question: (replacing words)
in thread regular expessions question: (replacing words)

Any comments on it?
Yeah, use an indentation style that makes sense. Indentation is there for *humans* only. It isn't some sort of magical lube that makes your program runs faster, and all that matters is to have some of it.
  • Comment on Re^3: regular expessions question: (replacing words)

Replies are listed 'Best First'.
Re^4: regular expessions question: (replacing words)
by $new_guy (Acolyte) on Sep 27, 2010 at 14:04 UTC
    Dear Perl monks,

    I have a successive question. Now how do I select two columns at random, count ONLY all the z's common to both columns.

    I would like to repeat this say 10 times and finally get the mean of all counts (i.e 10 random selection).

    It gets more complicated. In the next round of random selection, I want to pick 3 columns and count the z's common to all of them, repeat this ten times. Do this .... until say n = 18 columns! getting the mean at each at the end of each instance! At the moment I have no idea on how to go about it! A hint would be really appreciated

    Thanks

      Does your data fit into memory? If not, it gets more complicated (or you just have to wait a long time for the data file to get read dozens of times). You would either have to store it into a database or compress it (i.e. 'z' is 1, not-z is 0, so that every element uses just one bit)

      If yes, read the file into an Array of Arrays:

      my @data; my $n=0; while ($organized=<DATA2>) { chomp; $organized=~s/(\s)\w+/$1z/g; push @{$data[$n++]}, (split /\s+/, $organized); }

      Now accessing column 5 of line 2 is just a simple $data[2][5]

      Now to get it easier, split your problem into easier parts. Create a subroutine that gets as parameter an arbitrary number of columns. This subroutine just counts all rows that have a 'z' in all these columns. You can do that with a loop (over the selected columns) inside a loop (over all rows).

      If you got that working (test it with some simple data), just create another array, add a random number. Then repeatedly add a random number (that is not already in the array) to the array, call the subroutine with it. Do that 18 times.

        Hi Jethro,

        Thanks for the explanation!

        Yes the data fits in memory! And yes it would be appropriate to say every z is 1 and non-z is 0.

        I still don't understand! How do I select two columns at random, then count only the z's that are common to all rows in the two columns. By count I meant if a z occurs in column 1 at row 6 and column 2 at row six then my count of z's would be 1. Notice my count will increase as I go down comparing the rows.

        Thanks