Hi All. Junior Perl coder here. Trying to write a script to split a large file of possibly millions of similar records into multiple smaller files. Have it for the most part but there is one caveat. I cannot split same record types across files. So for example, if I have 1000 record type "A" appearing sequentially in the large file, and I reach my defined "smaller" file size while in the middle of said "A" records, I need to continue adding to the smaller file with the "A"s until I reach the next record type in the large file at which time I would want to start the next "small" file. I am having trouble with that part of the script. I cannot figure out how to make record pointer advance to the next record in the "big" file so that I can "step" through record by record until I find the next record type which will be my trigger to start te next "small" file. Following is my code. Any help would be greatly appreciated.

foreach $filename (@prfiles) { chomp $filename; $total_recs=0; $counter_recs=0; $previous_rec_ssn=0; $file_size_met="N"; open (INFILE1, '<', "$filename") or print "Cannot open $filename. +"; print "Now processing: $filename\n"; while (<INFILE1>) { if ($file_size_met ne "Y") { $file_count=1; $mod_filename="$filename\.$file_count" ; print "Writing output to: $mod_filename\n\n"; open (OUTFILE1, ">>" . "$mod_filename") or exit(201); while ($counter_recs < 50000) { print OUTFILE1 $_; $total_recs=($total_recs + 1); $counter_recs=($counter_recs + 1); } print "$total_recs records have been processed\n"; $counter_recs=0; } $actual_size = (stat($mod_filename))[7]; if ($actual_size >= $outsize){$file_size_met="Y"}; print "Current file size is $actual_size bytes.\n"; # ***** THIS IS WHERE I AM GOING OFF THE RAILS ***** if (($actual_size == $outsize) or ($actual_size > $outsize) an +d ($file_size_met eq "Y")) { @line_contents = split (/\|/,$_); $record_ssn=($line_contents[6]); print "current SSN is $record_ssn.\n"; if ($previous_rec_ssn == $record_ssn) { print OUTFILE1 $_; $previous_rec_ssn=$record_ssn; print "previous SSN is $previous_rec_ssn.\n"; print "current SSN is $record_ssn.\n"; } next; } next; } close INFILE1; close OUTFILE1; print "\n$total_recs records were processed for $filename\n\n"; }

In reply to Large FIle Splitter by insta.gator

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.