seeking help for seek function

pearly has asked for the wisdom of the Perl Monks concerning the following question:

hi monks, i have 2 files, i need to extract some data from one file based on another file. the data file is around 1GB. Since, its too big to read line by line, i got this idea to read it at their positions. I used tell and seek functions. But i have problems in getting the desired output. check out my code.

file1:
SID834.56    AGAAGTCGTACGATCA
SID164.26    AGTCGATCATTATATATTCGCTAG
SID4.56    AGCTAGCGATCGATCCCCCCCCCCCCCCCC
SID5764.12    CGATCGATC
SID564.12    ACGAATATGATAC

file2:
cluster number 1: (reads count:2)
    SID834.56
    SID564.12
cluster number 2: (reads count:2)
    SID164.26
    SID5764.12
cluster number 3: (reads count:1)
    SID4.56

code:
#!/usr/bin/perl -w
use strict;
use warnings;

open(FH1,$ARGV[0]) or die "can not open\n";
open(FH2,$ARGV[1]) or die "can not open\n";

my @indx;

while(<FH1>){
    my ($id,$seq)=split("\t",$_);
    push(@indx, "$id\t".tell FH1);
}


while(<FH2>){
    if($_=~m/^clus/){
        my $clushead=$_;
        print "\n$clushead";
    }
    else{
        $_=~s/\t//g;$_=~s/\n//g;
        my $tes=$_;
        my @hit=grep(/$tes/,@indx);        
        my $sca="@hit";
        my ($id1,$pos)=split("\t",$sca);
        print sysseek (FH1,$pos,0),"\n" or die "seek:$!";
    }
}

desired results:
cluster number 1: (reads count:2)
SID834.56    AGAAGTCGTACGATCA
SID564.12    ACGAATATGATAC
cluster number 2: (reads count:2)
SID164.26    AGTCGATCATTATATATTCGCTAG
SID5764.12    CGATCGATC
cluster number 3: (reads count:1)
SID4.56    AGCTAGCGATCGATCCCCCCCCCCCCCCCC

results which i get now:
cluster number 1: (reads count:2)
27
146

cluster number 2: (reads count:2)
62
122

cluster number 3: (reads count:1)
101
[download]

why is the seek function not fetching the content but the position? thanks !!!

Comment on seeking help for seek function Download Code

Replies are listed 'Best First'.
Re: seeking help for seek function by moritz (Cardinal) on Mar 25, 2010 at 07:35 UTC
Why do you use sysseek instead of seek? All the functions beginning with `sys` are for unbuffered IO, whereas readline is buffered IO. The docs warn against mixing those. Perl 6 - links to (nearly) everything that is Perl 6.	[reply] [d/l]
Re^2: seeking help for seek function by Anonymous Monk on Mar 25, 2010 at 08:58 UTC
seek prints 1 (true), only sysseek prints the position. can you please tell me how i can print the line from the file if i use seek?	[reply]
Re^3: seeking help for seek function by Corion (Patriarch) on Mar 25, 2010 at 09:02 UTC
If you want to know the position, use tell. If you want to set the position, use seek.	[reply]
Re^2: seeking help for seek function by pearly (Initiate) on Mar 25, 2010 at 09:01 UTC
seek prints 1 (true), only sysseek prints the position. can you please tell me how i can print the line from the file if i use seek?	[reply]
Re^3: seeking help for seek function by cdarke (Prior) on Mar 25, 2010 at 10:27 UTC
sysseek and seek do not print anything, it is you calling print. Those functions set the file position, they do not read the file (as others have said). Looking at your code - forgive me if I am wrong here - you seem to be getting the file positions from the first file and using those same positions to find records in the second file. Unless each file has corresponding records of exactly the same length, then that will not work. tell and sysseek give the current byte offset position in the current file, that position will not (usually) apply to another file unless it is exactly the same format.	[reply]
Re^4: seeking help for seek function by pearly (Initiate) on Mar 25, 2010 at 10:33 UTC
Re^5: seeking help for seek function by Marshall (Canon) on Mar 25, 2010 at 10:56 UTC
Re: seeking help for seek function by ikegami (Patriarch) on Mar 25, 2010 at 07:11 UTC
`seek` and `sysseek` just move the file pointer. If you want the data that follows, you need to read it.	[reply]
Re^2: seeking help for seek function by pearly (Initiate) on Mar 25, 2010 at 07:17 UTC
i used readline function below seek, like this: `$buffer = readline( *FH1 ); print("$buffer");` [download] but it still doesnt give the right sequence.	[reply] [d/l]
Re^3: seeking help for seek function by ikegami (Patriarch) on Mar 25, 2010 at 07:50 UTC
You use `tell` for `SID834.56` when the file pointer is here: `SID834.56 AGAAGTCGTACGATCA []SID164.26 AGTCGATCATTATATATTCGCTAG` [download] You want to use `tell` for `SID834.56` when the file pointer is here: `SID834.56 []AGAAGTCGTACGATCA SID164.26 AGTCGATCATTATATATTCGCTAG` [download] That's not easy to do, but it's easy and acceptable to use `tell` for `SID834.56` when the file pointer is here: `[*]SID834.56 AGAAGTCGTACGATCA SID164.26 AGTCGATCATTATATATTCGCTAG` [download] (i.e. before you read the line)	[reply] [d/l] [select]
Re^4: seeking help for seek function by Anonymous Monk on Mar 25, 2010 at 08:56 UTC
Re^4: seeking help for seek function by pearly (Initiate) on Mar 25, 2010 at 09:57 UTC
Re^4: seeking help for seek function by pearly (Initiate) on Mar 25, 2010 at 09:01 UTC
Re^5: seeking help for seek function by almut (Canon) on Mar 25, 2010 at 09:15 UTC
Re: seeking help for seek function by BrowserUk (Patriarch) on Mar 25, 2010 at 14:02 UTC
the data file is around 1GB. Since, its too big to read line by line, I think the above is your biggest mistake. It doesn't matter how big the file is, so long as the individual lines aren't >2GB, then you can read the file line by line. I think that all your seek/tell stuff is just a distraction from your real problem. This might be a true case of the XY problem. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply]
Re^2: seeking help for seek function by pearly (Initiate) on Mar 26, 2010 at 06:36 UTC
hi, thank you very much for your advice. As you said, i tried a different method to solve the problem and succeeded in it, its very fast too :) here is how i did it !!! `#!/usr/bin/perl -w use strict; use warnings; open(FH1,$ARGV[0]) or die "can not open\n"; open(FH2,$ARGV[1]) or die "can not open\n"; my @indx; while(<FH1>){ my ($id,$seq)=split("\t",$_); push(@indx, $id,$seq); } my %hashseq=@indx; while(<FH2>){ if($_=~m/^clus/){ my $clushead=$_; print "$clushead"; } else{ $_=~s/\t//g;$_=~s/\n//g; my $tes=$_; print $tes,"\t",$hashseq{"$tes"}; } }` [download] Thank you very much once again :) (p.s: sorry for posting twice. forgot to login previously.)	[reply] [d/l]
Re^2: seeking help for seek function by Anonymous Monk on Mar 26, 2010 at 06:33 UTC
hi, thank you very much for your advice. As you said, i tried a different method to solve the problem and succeeded in it, its very fast too :) here is how i did it !!! `#!/usr/bin/perl -w use strict; use warnings; open(FH1,$ARGV[0]) or die "can not open\n"; open(FH2,$ARGV[1]) or die "can not open\n"; my @indx; while(<FH1>){ my ($id,$seq)=split("\t",$_); push(@indx, $id,$seq); } my %hashseq=@indx; while(<FH2>){ if($_=~m/^clus/){ my $clushead=$_; print "$clushead"; } else{ $_=~s/\t//g;$_=~s/\n//g; my $tes=$_; print $tes,"\t",$hashseq{"$tes"}; } }` [download] Thank you very much once again :)	[reply] [d/l]