K_Edw has asked for the wisdom of the Perl Monks concerning the following question:
I have a small snippet of code which processes a tab-delimited .txt file 2 lines at a time:
while (<$IN2>) { chomp $_; next if eof; my @F2 = split( "\t", $_ ); #Split each tab-delimite +d field my $partner = <$IN2>; my @F3 = split( "\t", $partner ); #Split each tab-delimite +d field $store{ ( abs( $F2[2] - $F3[2] ) + 1 ) }++; ( $F2[2], $F3[2] ) = ( $F3[2], $F2[2] ) if $F2[2] > $F3[2]; $Tally{ $F2[1] }{ $F2[2] }{ $F3[2] }++; } foreach my $key ( sort { $a <=> $b } keys %store ) { print $OUT4 "$key\t$store{$key}\n"; } foreach my $chr ( sort { $a <=> $b } keys %Tally ) { foreach my $value1 ( sort { $a <=> $b } keys %{ $Tally{$chr} } ) { foreach my $value2 ( sort { $a <=> $b } keys %{ $Tally{$chr}{$value1 +} } ) { print $OUT5 "$chr\t$value1\t$value2\t$Tally{$chr}{$value1}{$value2}\ +n"; } } }
When attempting to run this on larger .txt files (>4,000,000 lines), I receive the following errors often near the end of the file but >50 lines from it):
Use of uninitialized value in subtraction (-) at line 117, <$IN2> line + 4148567. Use of uninitialized value in numeric gt (>) at line 118, <$IN2> line +4148567. Use of uninitialized value $F2[2] in hash element at line 119, <$IN2> +line 4148567. Argument "" isn't numeric in sort at line 127, <$IN2> line 4148567.
Line 117 - $store{(abs($F2[2]-$F3[2])+1)}++; Line 118 - ($F2[2], $F3[2]) = ($F3[2], $F2[2]) if $F2[2] > $F3[2]; Line 119 - $Tally{$F2[1]}{$F2[2]}{$F3[2]}++; Line 127 - foreach my $value1 (sort {$a <=> $b} keys %{$Tally{$chr}}) +{
Printing $. confirms that the script simply terminates at this input file line and no further lines are read in. If the input file is sorted, the error occurs approximately in the same place but on a different line of content. There is nothing obviously wrong with the content of the file and all lines match expectations.
However, if I split the input file into two halves - the script runs to completion without error.
Am I hitting some sort of memory or hash limit? Is there a way to fix this without having to split the input file before processing? This was run on Perl 5.25.2 but also occurs on 5.24.0.
The format of the input file is as such:
w 11 99658 75 75M 0 c 11 99999 75 75M 74 w 2 702424 75 75M 0 c 2 702556 75 75M 74 c 13 82486 75 75M 74 w 13 82171 75 75M 0 c 2 702585 75 75M 74 w 2 702390 75 75M 0 c 18 2529 75 75M 74 w 18 2232 75 75M 0 c 12 264648 75 75M 74 w 12 264366 74 74M 0 c 10 177758 75 75M 74 w 10 177438 74 74M 0 w 7 185488 74 74M 0
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Error when running on larger files
by BrowserUk (Patriarch) on Jun 28, 2016 at 08:49 UTC | |
by K_Edw (Beadle) on Jun 28, 2016 at 08:57 UTC | |
by BrowserUk (Patriarch) on Jun 28, 2016 at 09:20 UTC | |
by K_Edw (Beadle) on Jun 28, 2016 at 09:40 UTC | |
by BrowserUk (Patriarch) on Jun 28, 2016 at 09:58 UTC | |
| |
by BrowserUk (Patriarch) on Jun 28, 2016 at 09:25 UTC | |
by derby (Abbot) on Jun 28, 2016 at 09:59 UTC | |
|
Re: Error when running on larger files
by Cow1337killr (Monk) on Jun 28, 2016 at 09:10 UTC |