Based on your question, I'd propose a different method:
Method 3:
Parse the data as it's available: You can use File::Tail to open the file and read the data (even while another program is generating the file). This allows you to continuously read / parse / write. Thus, you can begin processing your data before you have the full terabyte.
For example, suppose we use the following to generate a stream of data:
#!/usr/bin/perl
# stream_write.pl - Slowly generate data
use strict;
use warnings;
open my $OFH, '>', 'the_stream.dat' or die $!;
binmode( $OFH, ":unix");
my $cnt = 0;
while ($cnt < 100) {
++$cnt;
my $cur_time = time;
print $OFH "$cnt, $cur_time\n";
sleep 5*rand;
}
close $OFH;
Then we can use something like this to read and parse the data while the original is running:
#!/usr/bin/perl
# stream_read.pl - Read, parse & print data as it's available
use strict;
use warnings;
use File::Tail;
my $IFH = File::Tail->new(
name=>"the_stream.dat",
tail=>-1, # Start at the beginning
);
while (defined(my $line = $IFH->read)) {
chomp $line;
my $cur_time = time;
my ($old_time, $cnt) = split /,\s*/, $line;
print "$cur_time data: $old_time, $cnt\n";
}
Then, when I ran them, the output of stream_read.pl was:
$ perl stream_read.pl
1310035999 data: 1, 1310035990
1310035999 data: 2, 1310035992
1310035999 data: 3, 1310035992
1310035999 data: 4, 1310035994
1310035999 data: 5, 1310035994
1310035999 data: 6, 1310035998
1310035999 data: 7, 1310035999
1310035999 data: 8, 1310035999
1310035999 data: 9, 1310035999
1310036001 data: 10, 1310036000
1310036001 data: 11, 1310036000
1310036008 data: 12, 1310036004
1310036014 data: 13, 1310036008
1310036014 data: 14, 1310036010
.....
...roboticus
When your only tool is a hammer, all problems look like your thumb. |