Utrecht has asked for the wisdom of the Perl Monks concerning the following question:
I am have 49 very long ascii-files (hundreds of thousands of lines). For each of this 49 files, I want to cp the first 1000 lines (1024 exactly) to be copied to a new file (say temp files 1 till 49). Subsequently use these files (processing stuff) and than cp the second 1000 rules from each file to 49 temp files, and so on and on till I've reached the end ot the long ascii-files (all having exact the same lengt).
I already have some code working, but the problem is, it is much too slow... In the following code (found on some man-page), for your understanding: $span is a constant @stations has length 49 $lower_bound gets increased by value of 1024 just as $upper_bound for every while loop ($nr_of_samples = 1024). Furthermore, $length is the total number of lines of the 49 huge ascii files.
As you can see for every line (loop from $lower_bound to $upper_bound), the files need to be openened, closed and both subs need to be called. There must be a faster way I think, so some advice would be higly appreciated :)
The subroutines:while(){ if($cnt % $nr_samples == 0){ $lower_bound = $lower_bound+$nr_samples; $upper_bound = $upper_bound+$nr_samples; } $cnt++; last if $upper_bound > ($length-1); foreach my $station(@stations){ open OUT, ">$span.$station.alpha.sac.data"; print OUT "2 $nr_samples\n"; for my $seeking($lower_bound..$upper_bound){ my $eval_file_2 = $suffix ? sprintf"%s%s_%s_%s",$prefix,$suffi +x,$station,$span : sprintf"%s_%s_%s",$prefix,$station,$span; open(FILE, "< $eval_file_2") or die "Can't open $eval_file_2 f +or reading: $!\n"; open(INDEX, "+>$eval_file_2.idx") or die "Can't open $eval_fil +e_2.idx for read/write: $!\n"; build_index(*FILE, *INDEX); my $line = line_with_index(*FILE, *INDEX, $seeking); close FILE; close INDEX; chomp $line; my($time,$value)=split(/\s+/,$line); printf OUT "%.3f %.10f\n",$time,$value; } close OUT; } }
sub build_index { my $data_file = shift; my $index_file = shift; my $offset = 0; while (<$data_file>) { print $index_file pack("N", $offset); $offset = tell($data_file); } } sub line_with_index { my $data_file = shift; my $index_file = shift; my $line_number = shift; my $size; # size of an index entry my $i_offset; # offset into the index of the entry my $entry; # index entry my $d_offset; # offset into the data file $size = length(pack("N", 0)); $i_offset = $size * ($line_number-1); seek($index_file, $i_offset, 0) or return; read($index_file, $entry, $size); $d_offset = unpack("N", $entry); seek($data_file, $d_offset, 0); return scalar(<$data_file>); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Accessing files at certain line number
by ikegami (Patriarch) on Sep 21, 2009 at 14:05 UTC | |
|
Re: Accessing files at certain line number
by Corion (Patriarch) on Sep 21, 2009 at 14:08 UTC | |
|
Re: Accessing files at certain line number
by Fletch (Bishop) on Sep 21, 2009 at 14:10 UTC | |
by ikegami (Patriarch) on Sep 21, 2009 at 15:14 UTC |