perlquestion
Ineffectual
Hello all,
I have a 4 column text file that looks like this:
<pre>
863 1 182856796 0
864 1 182856743 0
865 1 182856690 0
866 1 182856800 0
867 4 147950905 0
868 9 101911655 0
869 9 33113120 1
870 16 79237586 0
871 2 150329972 0
872 10 131981014 1
873 1 236140738 1
874 X 102930959 1
875 2 68407925 1
</pre><br>
The first column is the line number. I want to create an index on this file so that I can quickly access the file by line number. I found a recipe that seems to do that in the Perl Cookbook recipe 8.8 - Reading a Particular Line in a File.
<br><br>
However, it is not retrieving the lines properly (probably because my data consists of strings and not unsigned longs). I attempted to re-run the index using Z* and Z and N* and other encodings, but I don't understand pack well enough to know if I'm doing it correctly and I've never managed to get the right string back from my unpack.
<code>
open(IN, $oneper) or die "Can't open file $oneper for reading: $!\n";
open(INDEX, "+>$file.idx") or die "Can't open $file.idx for read/write: $!\n";
build_index(*IN, *INDEX);
# usage: build_index(*DATA_HANDLE, *INDEX_HANDLE)
sub build_index {
my $data_file = shift;
my $index_file = shift;
my $offset = 0;
while (<$data_file>) {
print $index_file pack("N", $offset);
$offset = tell($data_file);
}
}
</code>
Unpack code:
<code>
# usage: line_with_index(*DATA_HANDLE, *INDEX_HANDLE, $LINE_NUMBER)
# returns line or undef if LINE_NUMBER was out of range
sub line_with_index {
my $data_file = shift;
my $index_file = shift;
my $line_number = shift;
my $size; # size of an index entry
my $i_offset; # offset into the index of the entry
my $entry; # index entry
my $d_offset; # offset into the data file
$size = length(pack("N", 0));
$i_offset = $size * ($line_number);
print "size is $size offset is $i_offset\n";
seek($index_file, $i_offset, 0) or return;
read($index_file, $entry, $size);
$d_offset = unpack("N", $entry);
seek($data_file, $d_offset, 0);
return scalar(<$data_file>);
}
</code>
Asking for line 3 using this code gives me back:<br>
size is 4 <br>
offset is 12 <br>
found line 39109 1 (incorrect data that appears to be from the middle of line 5)<br><br>
Thanks in advance for your help!
<br><br>
Update:<br>
My file also contains lines that look like:<br>
<pre>
513 7 126096599 0
514 Multi
515 7 126116797 0
516 NotOn
517 7 126120072 0
518 7 126129103 0
519 7 126129249 0
520 7 126141464 0
521 7 126172869 0
522 7 126177331 0
523 7 126183528 0
524 19 49379166 1
525 2 172414527 1
526 7
527 19 49379181 1
528 2 172414461 1
529 4 39549110 0
530 21 40195276 1
531 No Results
532 14 39651192 0
533 7
534 7
</pre>
So the 34 bytes per line isn't true. Sorry.