micwood has asked for the wisdom of the Perl Monks concerning the following question:
I have a program that takes a file with records from a database, pulls out the information I need from each record, and prints a nice comma delimited file. The program works nice for my test records. However, the file I need to parse through is 4.5 GB. When I start perl on this file, it freezes (or at least it appears to)--there is no growth in the size of the output file but the CPU appears to be processing a huge amount of data. I thought that the program would read just the current record from the file, print that to the output, and move on to the next record, but I do not think this is happening. This is my code abbreviated (taking out the actual reg. expression parsing):
#!/usr/bin/perl -w use strict; open(OUT, ">/Users/micwood/Desktop/output.csv"); my $awardhashref= (); my $allDocs = do { local $/ = '<hr>\r'; <>; }; my $rxExtractDoc = qr {(?xms) (<h4>Award\s\#(\d+)(.*?)<hr>) }; while ($allDocs =~ m{$rxExtractDoc}g ) {my %award = (); # award hash $award{'record'}= $1; $award{'A_awardno'}= $2; $award{'entireaward'}= $3; $award{'entireaward'}=~ s/\t//g; $award{'entireaward'}=~ s/\r//g; if ($award{'entireaward'} =~ m{Dollars Obligated(.*?)\$([^<]+?)< +}gi){ $award{'B_dollob'} = $2}; if ($award{'entireaward'} =~ m{Current Contract Value(.*?)\$([^< +]+?)<}gi){ $award{'C_currentconvalue'} = $2};
etc, etc....this deleted section is the data extraction, where it is taking out the information I need. I then print to screen and then write to the OUT file:
print qq{Award Number: $award{'A_awardno'}\n}, qq{Dollars Obligated: $award{'B_dollob'}\n}, qq{Current Contract Value: $award{'C_currentconvalue'}\n}, qq{Ultimate Contract Value: $award{'D_ultconvalue'}\n}, qq{Contracting Agency: $award{'E_conagency'}\n}, q {-} x 25, qq{\n}; delete $award{'entireaward'}; delete $award{'record'}; foreach my $key (sort keys %award){ print OUT '"' . $award{$key} . '",'}; print OUT"\n"; $awardhashref= \%award; } my @thekeys = sort keys %$awardhashref; $, = ","; print (@thekeys, "\n"); print OUT (@thekeys, "\n"); close OUT;
so my questions are: should it not be cycling through the file, reading in a record at a time and printing it to the screen and the OUT file? Is there a better to way to deal with reading in blocks given such a large file? Again, the file works great on smaller files but it seems confused with the 4.5 GB file. Is it possible that it is working but won't see anything for a while?
I am still very green with Perl so any help would be greatly appreciated. Thanks again, Michael
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Large file data extraction
by GrandFather (Saint) on Aug 12, 2008 at 00:28 UTC | |
by tod222 (Pilgrim) on Aug 12, 2008 at 01:43 UTC | |
|
Re: Large file data extraction
by Cristoforo (Curate) on Aug 12, 2008 at 01:34 UTC | |
by ikegami (Patriarch) on Aug 12, 2008 at 02:03 UTC | |
|
Re: Large file data extraction
by Fletch (Bishop) on Aug 11, 2008 at 23:41 UTC | |
by GrandFather (Saint) on Aug 12, 2008 at 00:36 UTC | |
|
Re: Large file data extraction
by eosbuddy (Scribe) on Aug 11, 2008 at 21:35 UTC | |
by micwood (Acolyte) on Aug 11, 2008 at 23:24 UTC | |
by eosbuddy (Scribe) on Aug 11, 2008 at 23:38 UTC | |
|
Re: Large file data extraction
by ikegami (Patriarch) on Aug 12, 2008 at 02:00 UTC | |
by micwood (Acolyte) on Aug 12, 2008 at 03:13 UTC | |
by ikegami (Patriarch) on Aug 12, 2008 at 03:51 UTC | |
by micwood (Acolyte) on Aug 12, 2008 at 05:59 UTC | |
by micwood (Acolyte) on Aug 12, 2008 at 05:17 UTC | |
by peter (Sexton) on Aug 12, 2008 at 15:04 UTC |