in reply to Re^5: General program and related problems
in thread General program and related problems

Hello For me @output was fine even without push, just sayin print @output before the foreach loop, I got all the rs from file1. Anyway that is not the problem. My output file is completely empty. This is the line of my data from file 2. Can you tell me how your data looks like? How did you build the test data ? Or I can just send you my data and see if your code is working on them. Cheers
  • Comment on Re^6: General program and related problems

Replies are listed 'Best First'.
Re^7: General program and related problems
by jethro (Monsignor) on Aug 06, 2009 at 08:16 UTC

    my testdata was like this:

    snp.txt: ----------- rs34569384 rs123456 rs234567 rs753444 ---------- chr22.txt: --------- bla sijghs bla rs234567 yes,first one fdjg rs123456 yes, second one ---------
      A part for the lenght of the row in file 2 , I do not see any difference. I guess you used tab to split the rs234567 with the others, but since we are looking at the beginning I do not think it makes a big deal. On the other hand my snp.txt doesn't look like that, but it is a collection of rs separated by a space with print "@output" or just a sequence of rs if print @output. So my file1 looks like:
      rs234567rs265897rs2458796rs2658974rs...

      Should I split the rs with /n then print a file 1 and then start from there? In this case I will have a file that looks like yours but that will be a collection of line, no an array

        ??? Your first paragraph makes no sense. You can't tell what snp.txt looks like if you print @output. The contents of @output is processed, changed by a split and a grep. Use an editor or 'less' to look at a file.

        You might try out the following (this is the same program, just added a Dumper-line before the foreach:

        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use strict; my$line; my@fields; my@output; open (FILE1, 'snp.txt') or die "can't open the file: $!"; open (FILE2,'chr22.txt') or die "can't open the file: $!"; open (FD, '>test.txt') or die "can't open the file: $!"; my $position = tell(FILE2); my %rs; while ($line=<FILE2>) { my ($key)= $line=~/^(rs\d{5,})\b/; if (defined $key) { $rs{$key}= $position; } $position=tell(FILE2); } while (defined ($line= <FILE1>)) { my@fields= split (/\s+/ ,$line); push @output, grep /^rs\d{5,}\b/ ,@fields; } print Dumper(\@output,\%rs); foreach (@output) { if (exists $rs{$_}) { seek(FILE2,$rs{$_},0); my $line= <FILE2>; print FD $line; } } close FILE1; close FILE2; close FD;

        With the data I posted, I get the following output:

        $VAR1 = [ 'rs34569384', 'rs123456', 'rs234567', 'rs753444' ]; $VAR2 = { 'rs123456' => 43, 'rs234567' => 15 };

        As you can see @output (==$VAR1) contains a lot of rs-numbers. %rs (==$VAR2) contains some of the same rs-numbers and corresponding file positions. You should see the same if you use my data.

        Now try it with your data. What is different? Are there rs-numbers that are in both @output and %rs? Do the file position numbers look correct?