I have a small snippet of code which processes a tab-delimited .txt file 2 lines at a time:

while (<$IN2>) { chomp $_; next if eof; my @F2 = split( "\t", $_ ); #Split each tab-delimite +d field my $partner = <$IN2>; my @F3 = split( "\t", $partner ); #Split each tab-delimite +d field $store{ ( abs( $F2[2] - $F3[2] ) + 1 ) }++; ( $F2[2], $F3[2] ) = ( $F3[2], $F2[2] ) if $F2[2] > $F3[2]; $Tally{ $F2[1] }{ $F2[2] }{ $F3[2] }++; } foreach my $key ( sort { $a <=> $b } keys %store ) { print $OUT4 "$key\t$store{$key}\n"; } foreach my $chr ( sort { $a <=> $b } keys %Tally ) { foreach my $value1 ( sort { $a <=> $b } keys %{ $Tally{$chr} } ) { foreach my $value2 ( sort { $a <=> $b } keys %{ $Tally{$chr}{$value1 +} } ) { print $OUT5 "$chr\t$value1\t$value2\t$Tally{$chr}{$value1}{$value2}\ +n"; } } }

When attempting to run this on larger .txt files (>4,000,000 lines), I receive the following errors often near the end of the file but >50 lines from it):

Use of uninitialized value in subtraction (-) at line 117, <$IN2> line + 4148567. Use of uninitialized value in numeric gt (>) at line 118, <$IN2> line +4148567. Use of uninitialized value $F2[2] in hash element at line 119, <$IN2> +line 4148567. Argument "" isn't numeric in sort at line 127, <$IN2> line 4148567.
Line 117 - $store{(abs($F2[2]-$F3[2])+1)}++; Line 118 - ($F2[2], $F3[2]) = ($F3[2], $F2[2]) if $F2[2] > $F3[2]; Line 119 - $Tally{$F2[1]}{$F2[2]}{$F3[2]}++; Line 127 - foreach my $value1 (sort {$a <=> $b} keys %{$Tally{$chr}}) +{

Printing $. confirms that the script simply terminates at this input file line and no further lines are read in. If the input file is sorted, the error occurs approximately in the same place but on a different line of content. There is nothing obviously wrong with the content of the file and all lines match expectations.

However, if I split the input file into two halves - the script runs to completion without error.

Am I hitting some sort of memory or hash limit? Is there a way to fix this without having to split the input file before processing? This was run on Perl 5.25.2 but also occurs on 5.24.0.

The format of the input file is as such:

w 11 99658 75 75M 0 c 11 99999 75 75M 74 w 2 702424 75 75M 0 c 2 702556 75 75M 74 c 13 82486 75 75M 74 w 13 82171 75 75M 0 c 2 702585 75 75M 74 w 2 702390 75 75M 0 c 18 2529 75 75M 74 w 18 2232 75 75M 0 c 12 264648 75 75M 74 w 12 264366 74 74M 0 c 10 177758 75 75M 74 w 10 177438 74 74M 0 w 7 185488 74 74M 0

In reply to Error when running on larger files by K_Edw

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.