in reply to General program and related problems

Which grep is on your mind, the internal perl function grep or the command line utility grep? Generally there is (especially in Perl) more than one way to do things, though some are better than others in a given situation.

You also seemed to imply that both data files are huge. Does that mean you are searching for thousands or millions of values in file 2 ?

If that were the case, a lot would depend on the characteristics of the data. If it is only single words you could create a hash (stored on disc) of the words in file 1 and check every word in file 2 for existance in the hash. If you are looking for whole lines instead, you could sort file 2 and your seach list and work from the beginning in both lists

If the list you are looking for is small on the other hand you could concatenate all search strings with '|' and use that string as search pattern, somewhat like this (untested):

my $searchstring= '\Q' . join('\E|\Q',@output). '\E'; while (my $line=<FILE2>) { if ($line=~/$searchstring/) { print $line; } }

\Q and \E make sure your search strings have any regex special characters like '|' escaped.

Other observations: ++ for your use of warnings and strict. But you also should indent correctly. Makes your code much more readable

And concerning your second posting. Please edit and use code tags for your data examples too

Replies are listed 'Best First'.
Re^2: General program and related problems
by micky744monk (Novice) on Aug 04, 2009 at 20:09 UTC
    Hello hello Thanks to the time you wasting with me
    I tried both code but I do not have any output file at end
    #!/usr/bin/perl -w use strict; my$line; my@fields; my@output; open (FILE1, 'snp2.txt') or die "can't open the file: $!"; open (FILE2,'chr22.txt') or die "can't open the file: $!"; open (FD, '>test.txt') or die "can't open the file: $!"; my $position = tell(FILE2); my %rs; while ($line=<FILE2>) { my ($key)= $line=~/^(rs\d{5,})\b/; if (defined $key) { $rs{$key}= $position; } $position=tell(FILE2); } while (defined ($line= <FILE1>)) { my@fields= split (/\s+/ ,$line); my@output = grep /^rs\d{5,}\b/ ,@fields; } foreach (@output) { if (exists $rs{$_}) { seek(FILE2,$rs{$_},0); my $line= <FILE2>; print FD $line; } } Close FILE1; Close FILE2; Close FD;
    This one the most convincing to me, but I have no output file at the end...I do not know where the problem could be

      Did you see that your program is producing an error message? It should be 'close', not 'Close' at the end

      Some words on debugging. If you don't know what your program does, insert print statements to find out (or use Data::Dumper).

      For example a simple print @output; before the foreach loop would have told you that @output is empty.

      Then you could have looked at the previous loop where @output should have been filled. A print join('|',@fields),"\n"; or even better print Dumper(\@fields); (you also need a line use Data::Dumper for this) and print Dumper(\@output); at the end of the loop would have given you surprising results. If you want to learn something, please do the above, look at the result and think about it. If you don't find the solution, read the spoiler below.

      After you solved the first problem you will see that there is a further problem, you are getting only the result of the last line in file1. The ouput of the prints or Dumper lines should give you a clue again, if not read the next spoiler

        Thank you again for the time you are spending with me....I understood the other problems, whilst the push @array is not really a problem since even with this code printing the @output list I have all the rs from file 1. The code is still not working, but trying to run the code and checking the output I noticed that the first code had an empty hash. Now I tried to play a bit with hash definition (and what I did could be completely wrong) but now the array is giving me the file1 together with a number that could be the position, so I guess the code is treating the file like only one string....I do not know it should be something wrong on the first block and on the hash definition....if you could point me in the right direction, I could keep going on the debugging....
        #!/usr/bin/perl -w use strict; my$line; my@fields; my@output; my$position; my%rs; open (FILE1, 'snp.txt') or die "can't open the file: $!"; open (FILE2,'chr22.txt') or die "can't open the file: $!"; open (FD, '>test.txt') or die "can't open the file: $!"; while ($line=<FILE2>) { $line=~/^(rs\d{5,})\b/; $position = tell( FILE2 ); if (defined $line) { $rs{$line}= $position; } } print %rs; while (defined ($line= <FILE1>)) { my@fields= split (/\s+/ ,$line); @output=grep /^rs\d{5,}\b/ ,@fields; } foreach (@output) { if (exists $rs{$_}) { seek(FILE2,$rs{$_},0); my $line= <FILE2>; print FD $line; } } close FILE1; close FILE2; close FD;
        The output file is empty to be precise Cheers again!