Ineffectual has asked for the wisdom of the Perl Monks concerning the following question:
Hello all,
I have a 4 column text file that looks like this:
The first column is the line number. I want to create an index on this file so that I can quickly access the file by line number. I found a recipe that seems to do that in the Perl Cookbook recipe 8.8 - Reading a Particular Line in a File.
However, it is not retrieving the lines properly (probably because my data consists of strings and not unsigned longs). I attempted to re-run the index using Z* and Z and N* and other encodings, but I don't understand pack well enough to know if I'm doing it correctly and I've never managed to get the right string back from my unpack.
size is 4
offset is 12
found line 39109 1 (incorrect data that appears to be from the middle of line 5)
Thanks in advance for your help!
Update:
My file also contains lines that look like:
863 1 182856796 0 864 1 182856743 0 865 1 182856690 0 866 1 182856800 0 867 4 147950905 0 868 9 101911655 0 869 9 33113120 1 870 16 79237586 0 871 2 150329972 0 872 10 131981014 1 873 1 236140738 1 874 X 102930959 1 875 2 68407925 1
The first column is the line number. I want to create an index on this file so that I can quickly access the file by line number. I found a recipe that seems to do that in the Perl Cookbook recipe 8.8 - Reading a Particular Line in a File.
However, it is not retrieving the lines properly (probably because my data consists of strings and not unsigned longs). I attempted to re-run the index using Z* and Z and N* and other encodings, but I don't understand pack well enough to know if I'm doing it correctly and I've never managed to get the right string back from my unpack.
Unpack code:open(IN, $oneper) or die "Can't open file $oneper for reading: $!\n"; open(INDEX, "+>$file.idx") or die "Can't open $file.idx for read/write +: $!\n"; build_index(*IN, *INDEX); # usage: build_index(*DATA_HANDLE, *INDEX_HANDLE) sub build_index { my $data_file = shift; my $index_file = shift; my $offset = 0; while (<$data_file>) { print $index_file pack("N", $offset); $offset = tell($data_file); } }
Asking for line 3 using this code gives me back:# usage: line_with_index(*DATA_HANDLE, *INDEX_HANDLE, $LINE_NUMBER) # returns line or undef if LINE_NUMBER was out of range sub line_with_index { my $data_file = shift; my $index_file = shift; my $line_number = shift; my $size; # size of an index entry my $i_offset; # offset into the index of the entry my $entry; # index entry my $d_offset; # offset into the data file $size = length(pack("N", 0)); $i_offset = $size * ($line_number); print "size is $size offset is $i_offset\n"; seek($index_file, $i_offset, 0) or return; read($index_file, $entry, $size); $d_offset = unpack("N", $entry); seek($data_file, $d_offset, 0); return scalar(<$data_file>); }
size is 4
offset is 12
found line 39109 1 (incorrect data that appears to be from the middle of line 5)
Thanks in advance for your help!
Update:
My file also contains lines that look like:
513 7 126096599 0 514 Multi 515 7 126116797 0 516 NotOn 517 7 126120072 0 518 7 126129103 0 519 7 126129249 0 520 7 126141464 0 521 7 126172869 0 522 7 126177331 0 523 7 126183528 0 524 19 49379166 1 525 2 172414527 1 526 7 527 19 49379181 1 528 2 172414461 1 529 4 39549110 0 530 21 40195276 1 531 No Results 532 14 39651192 0 533 7 534 7So the 34 bytes per line isn't true. Sorry.
Back to
Seekers of Perl Wisdom