in reply to Error when running on larger files

Am I hitting some sort of memory or hash limit?

If you were hitting a memory limit, I would not expect the errors you are seeing.

How long are the lines in the file? (A small representative sample would be good.)


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

Replies are listed 'Best First'.
Re^2: Error when running on larger files
by K_Edw (Beadle) on Jun 28, 2016 at 08:57 UTC
    Updated OP with an example of the format. I too would not expect such errors however I cannot think of an alternative explanation as it does not appear to be caused by any specific line or error in the content. After sorting or randomly shuffling the order of the lines within the input file, the error still occurs somewhere toward the end of the file.

      I just generated a 5 million line file that approximates your format:

      #! perl -slw use strict; for( 1 .. 5e6 ) { my $n1 = int( rand 100 ); my $n2 = int( rand 1e6 ); printf "w\t%d\t%d\t75\t75M\t0\n", $n1, $n2; printf "c\t%d\t%d\t75\t75M\t0\n", $n1, $n2 - int( rand 1000 ) + 50 +0; } __END__ C:\test>head 1166792.dat w 70 437286 75 75M 0 c 70 437579 75 75M 0 w 50 852386 75 75M 0 c 50 852473 75 75M 0 w 45 45196 75 75M 0 c 45 45695 75 75M 0 w 83 1739 75 75M 0 c 83 1590 75 75M 0 w 31 838500 75 75M 0 c 31 838902 75 75M 0

      And wrapped your posted snippet up to allow it to run:

      #! perl -slw use strict; open my $IN2, '<', '1166792.dat' or die $!; my( %store, %Tally ); while (<$IN2>) { chomp $_; next if eof; my @F2 = split( "\t", $_ ); #Split each tab-delimite +d field my $partner = <$IN2>; my @F3 = split( "\t", $partner ); #Split each tab-delimite +d field $store{ ( abs( $F2[2] - $F3[2] ) + 1 ) }++; ( $F2[2], $F3[2] ) = ( $F3[2], $F2[2] ) if $F2[2] > $F3[2]; $Tally{ $F2[1] }{ $F2[2] }{ $F3[2] }++; } foreach my $key ( sort { $a <=> $b } keys %store ) { print "$key\t$store{$key}\n"; } foreach my $chr ( sort { $a <=> $b } keys %Tally ) { foreach my $value1 ( sort { $a <=> $b } keys %{ $Tally{$chr} } ) { foreach my $value2 ( sort { $a <=> $b } keys %{ $Tally{$chr}{$ +value1} } ) { print "$chr\t$value1\t$value2\t$Tally{$chr}{$value1}{$valu +e2}\n"; } } }

      And it runs to completion under 5.22 using just under 1.2GB.

      Unless you're on a very memory constrained system, memory isn't the problem. Do you have somewhere you can post your failing datafile (zipped.)?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

        I wonder if it is indeed a subtle error with the input file then - although that would be quite odd as every line in it was printed by Perl in the exact same manner. I've uploaded my failing file here:

        https://www.dropbox.com/s/mofwaf7iiuif0ur/Example.txt.zip?dl=0

        Let me know if that is not suitable/inaccessible. Should be able to download without logging in.

      Ignore this: That was harangzsolt33.

      You're not the guy I saw mention last week some time he was using tinyperl?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice. Not understood.

      Use of uninitialized value in subtraction (-) at line 117, <$IN2> line + 4148567. Use of uninitialized value in numeric gt (>) at line 118, <$IN2> line +4148567. Use of uninitialized value $F2[2] in hash element at line 119, <$IN2> +line 4148567. Argument "" isn't numeric in sort at line 127, <$IN2> line 4148567.

      Instead of randomly reshuffling the file, could you not extract that 1 line (head -4148567 <file> | tail -1) and then run your script against that 1 line? If I head to guess, seeing that $F[2] is the culprit, I'm guessing you have a double tab on that particular line.

      update:Nevermind ... I shouldn't answer questions without first reading the whole thing and second not having finished my morning cup of coffee.

      -derby