I have a HUGE file that i need to parse and insert into a database ( ~334G of text ), and the current code is devouring memory and bringing the system to its knees.
the algorithm i'm currently using:
other nodes, like Quickest way of reading in large files?, reading (caching?) large files, mention the while loop as the least memory intensive, but shy of trying to split the file into other, smaller files, is there a solution?open( FILE, $filename ) or die " Can't open file ($filename): $!\n"; while ( my $line = <FILE> ) { chomp $line; my @fields = split( /\t/, $line ); ### do some added parsing, then insert }
UPDATES:
OK, added clarifications.
## outside the loop my @firstLines = qw/ AccessInstr PropType UID Acres MLSNum MediaSource + /; my %firsts = map { $_ => 1 } @firstLines; ## the loop while ( my $line = <FILE> ) { chomp $line; my @fields = split( /\t/, $line ); my $lines = 0; if ( $firsts{$fields[0]} ) { %lookup = map { $lines++ => $_ } @fields; @dbFields = values %lookup; } %reversed = reverse %lookup; $mlsField = $reversed{'MLSNum'}; $mlsNumber = $fields[$mlsField]; $officeListField = $reversed{'OfficeList'}; my $officeList = $fields[$officeListField]; $officeSellField = $reversed{'OfficeSell'}; my $officeSell = $fields[$officeSellField]; next if ( $firsts{$fields[0]} ); while ( scalar( @fields ) != scalar( @dbFields ) ) { push ( @fields, undef ); } $testSth->execute( $mlsNumber ); my $inDB = $testSth->fetchrow(); $jbgTestSth->execute( $mlsNumber ); my $inJBGDB = $jbgTestSth->fetchrow(); my @updateVals ; if ( $inDB ) { my $i = 0; foreach my $loop ( @fields ) { push( @updateVals, $loop ) unless $i == $mlsField; $i++; } push( @updateVals, $mlsNumber ); $updSth->execute( @updateVals ); } else { $insSth->execute( @fields ) unless ( $fields[$mlsField] eq '' ); } if ( ( $officeList && grep( /$officeList/, @offices ) ) or ( $officeSell && grep( /$officeSell/, @offices ) ) ) { if ( $inJBGDB ) { $updSummary->execute( @updateVals ) or warn "no update! : " . $dbh->errstr(); } else { $insSummary->execute( @fields ) } } }
In reply to processing huge files by geektron
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |