Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Buggy CPAN Module (Statistics::R)

by maybeD (Sexton)
on Mar 01, 2006 at 13:35 UTC ( [id://533645]=perlquestion: print w/replies, xml ) Need Help??

maybeD has asked for the wisdom of the Perl Monks concerning the following question:

I wish to using a Perl module from CPAN, 'Statistics::R' to pipe some parsed data to R (a free statistics package) and read the results of a statistical test "on-the-fly" back into Perl so I can put them in a database.

However, the module has a bug in it that makes it unsuitable for large scale use. The bug is described here, and causes the process to freeze up.

My own experience is not close to sufficient to debug this complex module, and attempts to contact the author and get him/her to take a look at resolving the bug have been unsuccessful.

I will be really grateful if someone could get this working, as my only (rather inefficient) alternative seems to be to use the system command to run my tests as a script, then parse the output.

Replies are listed 'Best First'.
Re: Buggy CPAN Module (Statistics::R)
by PodMaster (Abbot) on Mar 01, 2006 at 14:03 UTC
    Which version of R-interp are you using? It might be important.
    Which perl are you using? ( update: perl versions are best described using 'perl -V')
    Did you try a different perl (newer)?

    I'm not suprised gmpassos won't debug something he can't reproduce :) (i'm just guessing thats the reason).

    The bug report mentions trouble using alarm, and in perl 5.8.0, "safe signals" were introduced, and they have consequences (see "Deferred Signals (Safe Signals)" in perlipc).

    update: My memory is vauge, but it has been a while (maybe a year) since I've seen gmpassos

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      Summary of my perl5 (revision 5 version 8 subversion 6) configuration: Platform: osname=MSWin32, osvers=4.0, archname=MSWin32-x86-multi-thread uname='' config_args='undef' hint=recommended, useposix=true, d_sigaction=undef usethreads=define use5005threads=undef useithreads=define usemulti +plicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi -DNDEBUG -O1 -DWIN32 -D +_CONSOLE -DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED -DPERL_IMPLICI +T_CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO -DPERL_MSVCRT_READFIX', optimize='-MD -Zi -DNDEBUG -O1', cppflags='-DWIN32' ccversion='', gccversion='', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=undef, longlongsize=8, d_longdbl=define, longdblsize=10 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='__int64 +', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='link', ldflags ='-nologo -nodefaultlib -debug -opt:ref,icf -l +ibpath:"C:\Perl\lib\CORE" -machine:x86' libpth=\lib libs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool.lib + comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib netap +i32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32.lib + odbccp32.lib msvcrt.lib perllibs= oldnames.lib kernel32.lib user32.lib gdi32.lib winspool +.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib n +etapi32.lib uuid.lib ws2_32.lib mpr.lib winmm.lib version.lib odbc32 +.lib odbccp32.lib msvcrt.lib libc=msvcrt.lib, so=dll, useshrplib=yes, libperl=perl58.lib gnulibc_version='undef' Dynamic Linking: dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' ' cccdlflags=' ', lddlflags='-dll -nologo -nodefaultlib -debug -opt: +ref,icf -libpath:"C:\Perl\lib\CORE" -machine:x86' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY USE_ITHREADS USE_LARGE_FILES PERL +_IMPLICIT_CONTEXT PERL_IMPLICIT_SYS Locally applied patches: ActivePerl Build 811 21540 Fix backward-compatibility issues in if.pm 23565 Wrong MANIFEST.SKIP Built under MSWin32 Compiled at Dec 13 2004 09:52:01 @INC: C:/Perl/lib C:/Perl/site/lib
      R is version 2.1.1 :)

      P.S Do you think it might help to try an older Perl version if something sig. changed in 5.8 to do with this? The pm hasn't been updated for 12 months so maybe its deprecated?

        osname=MSWin32
        I'm confused. In the original ticket in RT, it says "perl 5.8.0, Linux kernel 2.4.27, glibc 2.3.2". Also, do you have a small program that can reproduce the problem?

        thor

        The only easy day was yesterday

Re: Buggy CPAN Module (Statistics::R)
by moklevat (Priest) on Mar 01, 2006 at 21:47 UTC
    I have been looking through the code for this module to see if anything jumps out at me. Based on what you describe and the information in the linked trouble ticket, I wondered if there might be a race condition between the send and read subroutines.

    So far I have identified only one unrelated bug in the Linux.pm code (I realize you are working with WinXP) but this prevented building the module on my linux box. Line 98 of Linux.pm should be:

    $this->{START_CMD} = "$this->{R_BIN} --slave --vanilla" ;

    The original included a "--gui=none" option that has been deprecated in the most recent versions of R>=2.1.

    In any case, I will continue to plod through the send and read subroutines to see if I can find something useful. I will also post the code here with the hope that someone with a better endogenous perl parser can suggest a test or fix faster.

    Here is the send subroutine from pipe.pm:

    and here is the read subroutine from pipe.pm.

Re: Buggy CPAN Module (Statistics::R)
by PodMaster (Abbot) on Mar 03, 2006 at 13:20 UTC
    I have done some debugging and for me its the send method that hangs ( Statistics::R::Bridge::pipe), in particular this loop
    my ($x,$xx) ; while( (!$has_quit || $this->{STOPING} == 1) && -e $file && $this- +>is_started( !$this->{STOPING} ) ) { ++$x ; ##print "sleep $file\n" ; select(undef,undef,undef,$delay) ; if ( $x == 20 ) { my (undef , $data) = $this->read_processR ; if ( $data =~ /\s$n\s+\.\.\.\s+\// ) { last ;} $x = 0 ; ++$xx ; $delay = 0.5 ; } if ( $xx > 5 ) { $status = undef ;} ## xx > 5 = x > 50 }
    Its because $n/$data don't match ($n ends up being 1, but $data contains a high number, the last number written to process.log).

    $n/$data are populated by read_processR, which reads process.log, which is written to by rterm.exe, as per instructions in start.r (PERLOUTPUTFILE), which is written from sub Statistics::R::Bridge::pipe::save_file_startR.

    What happens is that at the beginning of send, where $n is set, read_processR reads data ending with "/", so $n defaults to 1.

    mokleva was right, it is some kind of race condition. Whether its rterm that misbehaves, or start.r or Statistics::R that make an assumption ... I don't know. Thats as far as I'm willing to go, but it should be enough information for someone familiar with rterm/statistics-r to fix it.

    BTW, my enviroment is perl v5.8.4 ActivePerl Build 810, R 2.2.1, WinXP Home.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      OK, its a real clumsy fudge but it seems to stop the hanging. With this modification in place, my Perl/R script misses some tests out (probably the ones it would have hung on)--but with a properly designed Perl script you can catch which ones it missed and have another go at the end.
      (The script I am currently working on with this is a DBI script, uses a MySQL database ENUM column to record whether all of the required tests have been carried out).

      In pipe.pm, I changed
      if ( $x == 20 ) { my (undef , $data) = $this->read_processR ; if ( $data =~ /\s$n\s+\.\.\.\s+\// ) { last ;} $x = 0 ;
      to
      if ( $x == 20 ) { my (undef , $data) = $this->read_processR ; last; $x = 0 ;
Re: Buggy CPAN Module (Statistics::R)
by maybeD (Sexton) on Mar 02, 2006 at 11:43 UTC
    I have written a short test script that can be used to experiment with this. This reproduces the problem inconsistently but is easy to use and does not require any data input.
    #!/usr/local/bin/perl # MaybeD # Test Script for Statistics::R. # 02 March 2006 # This is a short script that reproduces the hanging bug in the Statis +tics::R CPAN module, described at http://rt.cpan.org/Public/Bug/Displ +ay.html?id=11918. # 500 random 2x2 contingency tables are generated and a Fisher's Test +performed on these tables in R using Statistics::R. The output of the + tests is captured using $R->read and is printed to an output file in + the same directory as the perl script is located. # The script has been tested in Windows XP (using ActiveState Active P +erl ver. 5.8.6, R version 2.1.1, and Statistics::R 0.0.02). use strict; use statistics::R; my $statsr_test_out = "statsr_test_out.txt"; open (OUT, ">$statsr_test_out") || die print "MaybeD Test Script for S +tatistics::R could not open $statsr_test_out\n"; print "Current Loop Iteration\n"; my $R = Statistics::R->new() ; $R->startR; for (my $test_count=0; $test_count<500; $test_count++) # How many Fish +er's tests are performed in total (default is 500). { my @fisher_matrix = ""; for (my $var_count=0; $var_count<4; $var_count++) { my $random = int rand(10000); push (@fisher_matrix, $random); } shift(@fisher_matrix); my $fisher = join(",", @fisher_matrix); my $actual_test_count = $test_count + 1; print STDERR "$actual_test_count\r"; # Counter printed to the screen s +o you can keep an eye on the iterations of the loop (each iteration, +one Fisher's test is carried out). $R->send("Fisherout".$test_count."<- fisher.test(matrix(c(".$fisher.") +,nrow=2))"); $R->send("print(Fisherout".$test_count.")"); my $ret = $R->read; print OUT "$ret\n\n"; } $R->stopR;
Re: Buggy CPAN Module (Statistics::R)
by moklevat (Priest) on Mar 05, 2006 at 22:22 UTC
    This is a possible workaround solution. At least on my system, if I change the delay in line 136 of pipe.pm from:

    my $delay = 0.02 ;

    to

    my $delay = 0.1 ;

    the test script runs for 500 iterations without hanging. The down side is that this slows down the iterations by a factor of 5 and there is no guarantee that the module will not hang under different system configurations.

    If you need to do a lot of these, depending on your specific application, I would suggest passing all the data to R at once and running the loop in R.

Re: Buggy CPAN Module (Statistics::R)
by moklevat (Priest) on Mar 08, 2006 at 15:47 UTC
    Having made no additional headway in debugging the apparent race condition in Statistics::R other than the delay workaround I proposed above, I would suggest you look into Omegahat's RSPerl interface. Installation is probably more painful that it needs to be, but it seems to be a solid performer.
      I've done a bit of work with the perl debugger on Statistics::R since I last posted in this thread. However, I haven't been able to get to the bottom of the problem.

      This is largely because I am not able to reproduce the bug in the debugger.

      Working on a UNIX machine and using the test script only slightly adjusted from that posted in this thread, running the script as normal five times it was unable in any of the attempts to get past iteration 25 without hanging.

      Running it through the debugger, on the other hand, I was unable to 'achieve' the hang, reaching 330 iterations of the Fisher's test loop without a failure.

      Although I could not make any headway on the bug itself with the problem, I was able to trace the path of the data through the module--there is too much to post here, but if it is of interest I will send it as it might save anyone else looking at this problem some time.

        I am able to 'repair' the bug by adding sleep statements in three places in pipe.pm, as follows: IN Statistics::R::Bridge::pipe::send:
        my $file = "$this->{LOG_DIR}/input.$n.r" ; sleep(2); while( -e $file || -e "$file._" ) {
        AGAIN IN Statistics::R::Bridge::pipe::send
        my ($x,$xx) ; sleep(2); while( (!$has_quit || $this->{STOPING} == 1) && -e $file && $this- +>is_started( !$this->{STOPING} ) ) { ++$x ;
        IN Statistics::R::Bridge::pipe::read_processR
        my ($n) = ( $data =~ /(\d+)\s*$/gi ); $n = 1 if $n eq '' ; sleep(1); return( $n , $data ) if wantarray ;
        By doing this, I was able to get through 500 Fisher's test iterations on a UNIX server with no failed tests.
        However, it is very slow because 5secs of artificial delay have been incorporated into each $R->send()!

        My hypothesised reason for this is that the adjustment prevents process.log and the input.x.r files in the log_dir from getting out of sync with $n and $data.

        The total time taken for 500 (successful) Fisher's tests with this alteration was: 358.38u 41.23s 1:23:39.62 7.9%

      Since I posted this thread I have tried RSPerl, but a) it only works on UNIX systems and b) having installed it and attempted to run it on a UNIX system, I have met with nothing but problems trying to do any useful statistics with it.
      There have been two previous posts on the relevant mailing list asking how to carry out a linear regression using the R from Perl facility of RSPerl, and no useful information has been forthcoming.

      Having tried myself and drawn a blank I have reluctantly gone back to Statistics::R.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://533645]
Approved by ciderpunx
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-03-29 09:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found