cafeblue has asked for the wisdom of the Perl Monks concerning the following question:
each 4 lines is a block, all the blocks are alike. all the even lines are the same length. I need to extract random lines in these files. So I build index files for these large text files. like script like below:@HWUSI-EAS1734_0032_FC620F7AAXX:5:1:18184:1176#CGATGT/1 GGATTTCTCGTGGANACCATTTGTTGGTCAANNNNNNNNNNGTGTTNGNCTTCANNGNNATTGAAAATGN +TCATTCGTGGCTATTTTCGCNNNNNATNNNN +HWUSI-EAS1734_0032_FC620F7AAXX:5:1:18184:1176#CGATGT/1 gggfggggfgeeecB```^]gffgegadcgBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB @HWUSI-EAS1734_0032_FC620F7AAXX:5:1:1934:1185#CGATGT/1 GTCATCCTTAATTANCGTATGTGCTCTTCCTNCNNNNNNNNGCTGCTANTTATTTCTNNGCAGCTTTGCT +CTTATTAGTTACGAACATGCCNNNNTANNNN +HWUSI-EAS1734_0032_FC620F7AAXX:5:1:1934:1185#CGATGT/1 acdad`^ddd^aa^B_\VZZfcfccaffBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB +BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB ..........
the question is, whenever I print lines in large line number, the out put is defective. the print code is like below:if (-e "$ARGV[0].idx") { open (INDEXFQ1, "$ARGV[0].idx") or die $!; } else { open (INDEXFQ1, "+>$ARGV[1].idx") or die $!; build_index(*FQ1, *INDEXFQ1); }
no error output information, but the line in large line number is defective, like below:print OQ10_1 line_with_index(*FQ1, *INDEXFQ1, $line);
can anybody help? thank you! sorry, there two sub for the build_index and line_with_index like below:741:20058#ATCACG/1 GTTCGTGAGAGCTCTAGGTTGTCGTCTCCCAGTCAACTATGGTCGCTGTAACGCGCTGACTT 41:20058#ATCACG/1 dgggg_ddadbaggedbXdd]^[UVYX]XR_BBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
sub build_index { my $data_file = shift; my $index_file = shift; my $offset = 0; while (<$data_file>) { print $index_file pack("N", $offset); $offset = tell($data_file); } } sub line_with_index { my $data_file = shift; my $index_file = shift; my $line_number = shift; my $size; my $i_offset; my $entry; my $d_offset; $size = length(pack("N", 0)); $i_offset = $size * ($line_number-1); seek($index_file, $i_offset, 0) or return; read($index_file, $entry, $size); $d_offset = unpack("N", $entry); seek($data_file, $d_offset, 0); return scalar(<$data_file>); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: index for large text file
by moritz (Cardinal) on Mar 28, 2011 at 06:25 UTC | |
by cafeblue (Novice) on Mar 28, 2011 at 06:52 UTC | |
by Eliya (Vicar) on Mar 28, 2011 at 10:50 UTC | |
by cafeblue (Novice) on Mar 29, 2011 at 02:32 UTC | |
by cnaf7211 (Initiate) on Sep 27, 2012 at 18:43 UTC | |
by moritz (Cardinal) on Mar 28, 2011 at 06:55 UTC | |
|
Re: index for large text file
by davido (Cardinal) on Mar 28, 2011 at 06:49 UTC | |
by cafeblue (Novice) on Mar 28, 2011 at 07:11 UTC | |
|
Re: index for large text file
by vkon (Curate) on Mar 28, 2011 at 07:52 UTC | |
|
Re: index for large text file
by Anonymous Monk on Mar 28, 2011 at 06:33 UTC | |
by educated_foo (Vicar) on Mar 28, 2011 at 14:26 UTC | |
by Anonymous Monk on Mar 30, 2011 at 08:13 UTC | |
|
Re: index for large text file
by GrandFather (Saint) on Mar 28, 2011 at 09:32 UTC | |
by Anonymous Monk on May 16, 2013 at 23:08 UTC | |
|
Re: index for large text file
by umasuresh (Hermit) on Mar 28, 2011 at 14:19 UTC |