comment on

Dear Monks! I have question concerning the following:

I am have 49 very long ascii-files (hundreds of thousands of lines). For each of this 49 files, I want to cp the first 1000 lines (1024 exactly) to be copied to a new file (say temp files 1 till 49). Subsequently use these files (processing stuff) and than cp the second 1000 rules from each file to 49 temp files, and so on and on till I've reached the end ot the long ascii-files (all having exact the same lengt).

I already have some code working, but the problem is, it is much too slow... In the following code (found on some man-page), for your understanding: $span is a constant @stations has length 49 $lower_bound gets increased by value of 1024 just as $upper_bound for every while loop ($nr_of_samples = 1024). Furthermore, $length is the total number of lines of the 49 huge ascii files.

As you can see for every line (loop from $lower_bound to $upper_bound), the files need to be openened, closed and both subs need to be called. There must be a faster way I think, so some advice would be higly appreciated :)

  while(){
    if($cnt % $nr_samples == 0){
      $lower_bound = $lower_bound+$nr_samples;
      $upper_bound = $upper_bound+$nr_samples;
    }
    $cnt++;
    last if $upper_bound > ($length-1);
    foreach my $station(@stations){
      open OUT, ">$span.$station.alpha.sac.data";
      print OUT "2 $nr_samples\n";
      for my $seeking($lower_bound..$upper_bound){
        my $eval_file_2 = $suffix ? sprintf"%s%s_%s_%s",$prefix,$suffi
+x,$station,$span : sprintf"%s_%s_%s",$prefix,$station,$span;
        open(FILE, "< $eval_file_2") or die "Can't open $eval_file_2 f
+or reading: $!\n";
        open(INDEX, "+>$eval_file_2.idx") or die "Can't open $eval_fil
+e_2.idx for read/write: $!\n";
        build_index(*FILE, *INDEX);
        my $line = line_with_index(*FILE, *INDEX, $seeking);
        close FILE;
        close INDEX;
        chomp $line;
        my($time,$value)=split(/\s+/,$line);
        printf OUT "%.3f %.10f\n",$time,$value;
      }
      close OUT;
    }
  }
[download]

The subroutines:


sub build_index {
    my $data_file   = shift;
    my $index_file  = shift;
    my $offset      = 0;

    while (<$data_file>) {
        print $index_file pack("N", $offset);
        $offset = tell($data_file);
    }
}


sub line_with_index {
    my $data_file   = shift;
    my $index_file  = shift;
    my $line_number = shift;

    my $size;               # size of an index entry
    my $i_offset;           # offset into the index of the entry
    my $entry;              # index entry
    my $d_offset;           # offset into the data file

    $size = length(pack("N", 0));
    $i_offset = $size * ($line_number-1);
    seek($index_file, $i_offset, 0) or return;
    read($index_file, $entry, $size);
    $d_offset = unpack("N", $entry);
    seek($data_file, $d_offset, 0);
    return scalar(<$data_file>);
}
[download]

In reply to Accessing files at certain line number by Utrecht

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.