O ye with great PERLs of wisdom...
In the code below you will see that I am attempting to split a large data file (sometimes in excess of 12-13 GB) into smaller sized chunks (in the example it is 500 KBs). The data is a whole bunch of separate invoices for customers that always begin with an "11" in the 67th character of the first line for the invoice.
I now want to be able to remove any invoices that are larger than say 25 MB in size and pull them out to their own file, before breaking this large file into the smaller chunks as it is doing currently.
my $chunksize = 500 * 1024; # 500Kb
my $filenumber = 0;
my $infile = "infile.dat";
my $outsize = 0;
my $eof = 0;
open INFILE, $infile;
open OUTFILE, ">outfile_".$filenumber.".dat";
while(<INFILE>)
{
chomp;
$outsize++;
if( $outsize>$chunksize and /^.{67}11/ )
{
close OUTFILE;
$outsize = 0;
$filenumber++;
open (OUTFILE, ">outfile_".$filenumber.".dat") or die
+"Can't open outfile_".$filenumber.".dat";
}
print OUTFILE "$_\n";
$outsize += length;
}
close INFILE;
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.