in reply to Re^2: Error when running on larger files
in thread Error when running on larger files

I just generated a 5 million line file that approximates your format:

#! perl -slw use strict; for( 1 .. 5e6 ) { my $n1 = int( rand 100 ); my $n2 = int( rand 1e6 ); printf "w\t%d\t%d\t75\t75M\t0\n", $n1, $n2; printf "c\t%d\t%d\t75\t75M\t0\n", $n1, $n2 - int( rand 1000 ) + 50 +0; } __END__ C:\test>head 1166792.dat w 70 437286 75 75M 0 c 70 437579 75 75M 0 w 50 852386 75 75M 0 c 50 852473 75 75M 0 w 45 45196 75 75M 0 c 45 45695 75 75M 0 w 83 1739 75 75M 0 c 83 1590 75 75M 0 w 31 838500 75 75M 0 c 31 838902 75 75M 0

And wrapped your posted snippet up to allow it to run:

#! perl -slw use strict; open my $IN2, '<', '1166792.dat' or die $!; my( %store, %Tally ); while (<$IN2>) { chomp $_; next if eof; my @F2 = split( "\t", $_ ); #Split each tab-delimite +d field my $partner = <$IN2>; my @F3 = split( "\t", $partner ); #Split each tab-delimite +d field $store{ ( abs( $F2[2] - $F3[2] ) + 1 ) }++; ( $F2[2], $F3[2] ) = ( $F3[2], $F2[2] ) if $F2[2] > $F3[2]; $Tally{ $F2[1] }{ $F2[2] }{ $F3[2] }++; } foreach my $key ( sort { $a <=> $b } keys %store ) { print "$key\t$store{$key}\n"; } foreach my $chr ( sort { $a <=> $b } keys %Tally ) { foreach my $value1 ( sort { $a <=> $b } keys %{ $Tally{$chr} } ) { foreach my $value2 ( sort { $a <=> $b } keys %{ $Tally{$chr}{$ +value1} } ) { print "$chr\t$value1\t$value2\t$Tally{$chr}{$value1}{$valu +e2}\n"; } } }

And it runs to completion under 5.22 using just under 1.2GB.

Unless you're on a very memory constrained system, memory isn't the problem. Do you have somewhere you can post your failing datafile (zipped.)?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

Replies are listed 'Best First'.
Re^4: Error when running on larger files
by K_Edw (Beadle) on Jun 28, 2016 at 09:40 UTC

    I wonder if it is indeed a subtle error with the input file then - although that would be quite odd as every line in it was printed by Perl in the exact same manner. I've uploaded my failing file here:

    https://www.dropbox.com/s/mofwaf7iiuif0ur/Example.txt.zip?dl=0

    Let me know if that is not suitable/inaccessible. Should be able to download without logging in.

      Hm. Other than having to add a line to discard the header; your file processed to completion without error.

      Two possibilities:

      1. (Unlikely) The data file is corrupted on disk with a bad block or similar.

        Copy the file to another name (or unzip the zip you posted) and try that.

      2. (More unlikely). Your perl installation is broken.

        Install another version and try again.

      Beyond those we're into a world of even more speculative guessing.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

        I tried also to run the code on the file. It's run perfectly on my system after throwing away the header.

        I still suspect you have somewhere unexpected characters in your input. The errors point to that. This can be invisible characters that aren't visible and can't be automatically transformed by Perl to a number. Try to validate your input by something like this:

        unless ($F2[2] =~ /^[+-]?\d+\.?\d*$/) {print "not an floating number\n +";} #assumes "." as digital separator, no thousand separator unless ($F3[2] =~ /^[+-]?\d+\.?\d*$/) {print "not an floating number\n +";}

        Look also to http://www.perlmonks.org/?node_id=622704 for a more in detail explanation. It is possible that later versions of Perl are more robust in transforming a string to a number.

        Martell

        How bizarre. I've tried running it on 5.25.2 and 5.24.0 installed via Perlbrew. Same error each time.

        It seems to be caused by something earlier in the script (this chunk of code forms a subsection of it) - running my file using your wrapped code allows it to run to completion.

        Why exactly it fails near the end of a file when it's part of a larger script but not when it's separate is beyond me.