in reply to Breaks on Mac but not Windows or Linux - huge IO

How much RAM do the different machines have? Is the file that causes problems when it gets large the taxonomy file used here?

my @taxR = <tax_file>;

If so, your problem may be that you are reading the entire file into memory at once (because each line is now an element of your array) and your machine is probably running out of memory. When you say the script fails, what exactly do you mean? Is there an error or? Does the process get killed by OOM Killer?

EDIT: I re-read your question some more and realized that whatever file $tempFile is is the one that causes problems as it gets too large. Is that correct? What does that file look like? Also, is a non-zero value or string always assigned during $filter = $taxR{$curLine[2]};? If so, I'm not sure I understand the if-conditional for checkSeq(@curLine);. What is checkSeq doing?

What happens if you run it on the Windows machine, but include use 5.10;? Just out of curiosity, you have use strict use warnings; and there are no errors, right?

Replies are listed 'Best First'.
Re^2: Breaks on Mac but not Windows - huge IO
by vivomancer (Initiate) on Jun 27, 2012 at 03:59 UTC

    I'm about to go to sleep so I won't be able to run some of your tests until the morning

    taxonomy.tab is 800kb, the smaller $tempFile is 250,000kb, the larger is 10gb. When I use a 23,000kb $tempFile, the program completes sucessfully on both a Mac and PC

    The problem does occur when $tempFile gets too large. $tempfile is format "$annotativeInformation\t$aminoacidSequence\t$taxonomyCode\n". The following is an example of one line of $tempFile. >sp|P48255|ABCX_CYAPA Probable ATP-dependent transporter ycf16 OS=Cyanophora paradoxa GN=ycf16 PE=3 SV=1 MSTEKTKILEVKNLKAQVDGTEILKGVNLTINSGEIHAIMGPNGSGKSTFSKILAGHPAYQVTGGEILFKNKNLLELEPEERARAGVFLAFQYPIEIAGVSNIDFLRLAYNNRRKEEGLTELDPLTFYSIVKEKLNVVKMDPHFLNRNVNEGFSGGEKKRNEILQMALLNPSLAILDETDSGLDIDALRIVAEGVNQLSNKENSIILITHYQRLLDYIVPDYIHVMQNGRILKTGGAELAKELEIKGYDWLNELEMVKK CYAPA

    What I mean by the script failing is that, the hash %taxR is messed up, as shown in the 3 examples of output (2 good, 1 bad) which causes the program to not forward any lines of $tempFile to the rest of the program. There is no error. For the program to progress, some of the hash values must equal 1

    $filter is true if there is a value for $taxR{$curLineΐ]} which is built near the top

    checkSeq(@curLine) is the rest of my program which works no matter where I test it, if I set $filter to be equal to 1, the program doesn't work because $filter is never set to 1 because the hash seems to break when $tempFile is too large

    I use strict but I haven't used warning, I'll have to check that

Re^2: Breaks on Mac but not Windows - huge IO
by vivomancer (Initiate) on Jun 27, 2012 at 16:49 UTC
    I changed the way it reads in tax_file to this
    my $taxon = $ARGV[3]; unless($taxon){ $taxon = "";#default is blank } $annotation .= "\t$taxon"; my @taxList = split(/\|/, $taxon); open(tax_file, "..".$slash."dataset".$slash."taxonomy.tab") or die "co +uldn't open taxonomy.tab"; #my @taxR = <tax_file>; my %taxR; if($taxon){ while(<tax_file>){ foreach my $tempTax (@taxList){ if($_ =~ m/$tempTax/i){ my @tempTax = split(/\t/, $_); $taxR{$tempTax[1]} = 1; } } } } close tax_file;

    I get the same results as last time. I also had the opportunity to test it on a unix machine and the program works fine on that machine.

    As far as use warnings goes, I need to do a lot of editing or parsing because my program relies heavily on uninitialized values counting as false, so I'm going to work on that now.

      BTW, in Perl, you can just use the forward-slash in paths, and it'll work just fine in Windows. No need for that silly $slash variable.