in reply to Can I split a 10GB file into 1 GB sizes using my repeating data pattern

Anonymous Monk,
This is an incredibly trivial task in perl. Here is some code (untested) that will get you started. I have intentionally left some things unoptimized with comments since you know better than I what should actually happen.
#!/usr/bin/perl use constant DAILY_RUN => 1024 * 1024 * 1000; use strict; use warnings; my $file = $ARGV[0] or die "Usage: $0 <input_file>"; open(my $fh, '<', $file) or die "Unable to open '$file' for reading: $ +!"; my $cnt = 1; my $out_file = "$file.$cnt"; # Will clobber an existing file by this name (fix if important) open(my $out_fh, '>', $out_file) or die "Unable to open '$out_file' fo +r writing: $!"; while (<$fh>) { if (-s $out_file > DAILY_RUN && /^100/) { ++$cnt; $out_file = "$file.$cnt"; open($out_fh, '>', $out_file) or die "Unable to open '$out_fil +e' for writing: $!"; } print $out_fh $_; }

Now it looks like your lines are fixed length so one optimization may be not to check after every write how big the file is but to wait until you have written enough to be at least 1 GB and set a flag to pay attention to the start of a 100 record. Additionally, this code writes at least 1GB and then starts a new file as soon as a 100 record is encountered - you may want to keep it under 1GB. Again, you are in a better position to address these than I am. Finally, it may be possible to process the record sets as a whole rather than a line at a time by setting $/ = "\n100"; That is an advanced technique that you can read about in perlvar. It complicates the code but it is presumably more efficient (less disk read/writes)

Cheers - L~R

Replies are listed 'Best First'.
Re^2: Can I split a 10GB file into 1 GB sizes using my repeating data pattern
by Anonymous Monk on Jul 23, 2009 at 00:43 UTC
    That worked perfectly Limbic~Region Thanks for your help... I hope I don't have to use it, as that means my initial ETL had crashed. This little exercise was interesting.. I did alot of research and had many interesting results (unsuccessful) from my own scripts. I'm going to try and adjust your script to see if I can add a header and footer.