maybeD has asked for the wisdom of the Perl Monks concerning the following question:

I am using a Perl module from CPAN, 'Statistics::R' to pipe some 2x2 contingency table data to R and return Fisher's Exact Test results, which I then store in an array to condense and output to a biologist user.

In my test file for this, there are about 80 sets of contingency table data. These are run through using a foreach loop.

The script does what it is supposed to - about one sixth of the times it is run. The other five sixths of the time, it stops at an arbitrary point in the loop (always between two sets of contingency tables, never in the middle of one).

Below is the loop (some variable names changed). Outside of the loop are just the Statistics::R loading code and stopR command, the declarations of the variables you see in the loop (strict is on), the commands to open the files, and the command to print the contents of the array to a debugging file.

foreach my $testdata (@R_input) { $testdata =~ /(Fisher's Exact Test for Cluster \d+ \([^\)]*\))/; $R_line0 = $1; $testdata =~ /(Test\d+ <-)/; $R_line1 = ($1.$add_whitespace); $testdata =~ /(matrix\(c\(\d+, \d+, \d+, \d+\),)/; $R_line2 = ($1.$add_whitespace); $testdata =~ /(nr = 2,)/; $R_line3 = ($1.$add_whitespace); $testdata =~ /(dimnames = list\(Row1 = c\("Col1", "Col2"\),)/; $R_line4 = ($1.$add_whitespace); $testdata =~ /(Row2 = c\("Col1", "Col2"\)\)\))/; $R_line5 = $1; $testdata =~ /(fisher.test\(Test\d+\))/; $R_line6 = $1; $R_send_0 = "$R_comment_out$R_line0"; $R_send_1 = "$R_line1$R_line2$R_line3$R_line4$R_line5"; $R_send_2 = "$R_dblquotation_start$R_line6$R_send_3"; my $r_send_comment = $R_send_0; my $r_send_string = $R_send_1; my $r_send_string2 = $R_send_2; $R->send (qq'$r_send_string'); $R->send (qq'$r_send_string2'); my $ret = $R->read; push (@R_output_store, $r_send_comment); push (@R_output_store, $ret); print $ret; print "\n\n"; }

Replies are listed 'Best First'.
Re: Peculiar Perl/R Problem - Memory?
by Corion (Patriarch) on Aug 08, 2005 at 12:39 UTC

    Most likely you are running against IPC buffering. The likely case is, that somewhere in between your program and R, all your commands stack in a buffer that waits for more input, and your read call waits for more input as well.

    I would look if the R module maybe has some flush command to flush all buffers after your send sequence. If there is no such facility, you will have to either create it yourself, by grabbing into the guts of R::Statistics and changing the file handle used for communication to autoflush, or by tearing down the communication after each batch of commands. Maybe you should simply batch-send all your commands one after another, and only then start reading the results in.

      OK, I've tried a few more things to test what the effects on the problem would be, now I know what the likely cause is.

      ->Eliminating the read commands did not affect the problem. The script still halts at arbitrary points in the loop even without the 'read from R' element.
      ->Restarting R at the end of the foreach loop (using the command $R -> restartR();) prevented the random halting. However, as you might expect the script is very slow with this workaround in place, since it has to load R anew with each iteration of the loop.

      I am somewhat new to the use of modules in my scripts, and completely so to IPC buffering, so any more guidance you could provide into how to implement the autoflush would be very much appreciated.

      Thanks for your help, M

        Update: Although because of the (greatly complicated) Perl module that this script is reliant on I wasn't able to try a lot of the suggestions, a simple modification has reduced the incidence of the problem to a much more acceptable level (now about 1 in 8 of the times the script is run). The modification is as follows:
        $R_send_0 = "$R_comment_out$R_line0"; $R_send_1 = "$R_line1$R_line2$R_line3$R_line4$R_line5"; $R_send_2 = "$R_dblquotation_start$R_line6$R_send_3"; my $r_send_comment = $R_send_0; my $r_send_string = $R_send_1; my $r_send_string2 = $R_send_2; $R->send (qq'$r_send_string'); $R->send (qq'$r_send_string2'); my $ret = $R->read; push (@R_output_store, $r_send_comment); push (@R_output_store, $ret); print $ret; print "\n\n";
        has been changed to:
        $R_send_0 = "$R_comment_out$R_line0"; $R_send_1 = "$R_line1$R_line2$R_line3$R_line4$R_line5"; $R_send_2 = "$R_dblquotation_start$R_line6$R_send_3"; $R_send = "$R_send_1\n$R_send_2"; my $r_send_comment = $R_send_0; my $r_send_string = $R_send; $R->send (qq'$r_send_string'); push (@R_output_store, $r_send_comment); print $r_send_comment; print "\n"; my $ret = $R->read; push (@R_output_store, $ret); print $ret; print "\n\n";
        The program as I say is still not completely free of problems. The freezing continues to occur at a low frequency, and a warning message of the form: 'Warning Message: cannot open file 'input.4.r' is sometimes displayed at the end of the screen-printed output.
      If a CPAN module is missing an essential method you want, I suggest it is easier to write a new little package that merely:

      - inherits the CPAN module, e.g. using @ISA

      - defines the required new method

      One world, one people

Re: Peculiar Perl/R Problem - Memory?
by zentara (Cardinal) on Aug 09, 2005 at 12:03 UTC
    Continuing with Corion's train of thought about you running into a buffering problem, you may be interested in IPC3 buffer limit problem, where you can test for the condition.

    I'm not really a human, but I play one on earth. flash japh