in reply to Re^6: General program and related problems
in thread General program and related problems

my testdata was like this:

snp.txt: ----------- rs34569384 rs123456 rs234567 rs753444 ---------- chr22.txt: --------- bla sijghs bla rs234567 yes,first one fdjg rs123456 yes, second one ---------

Replies are listed 'Best First'.
Re^8: General program and related problems
by micky744monk (Novice) on Aug 06, 2009 at 08:45 UTC
    A part for the lenght of the row in file 2 , I do not see any difference. I guess you used tab to split the rs234567 with the others, but since we are looking at the beginning I do not think it makes a big deal. On the other hand my snp.txt doesn't look like that, but it is a collection of rs separated by a space with print "@output" or just a sequence of rs if print @output. So my file1 looks like:
    rs234567rs265897rs2458796rs2658974rs...

    Should I split the rs with /n then print a file 1 and then start from there? In this case I will have a file that looks like yours but that will be a collection of line, no an array

      ??? Your first paragraph makes no sense. You can't tell what snp.txt looks like if you print @output. The contents of @output is processed, changed by a split and a grep. Use an editor or 'less' to look at a file.

      You might try out the following (this is the same program, just added a Dumper-line before the foreach:

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use strict; my$line; my@fields; my@output; open (FILE1, 'snp.txt') or die "can't open the file: $!"; open (FILE2,'chr22.txt') or die "can't open the file: $!"; open (FD, '>test.txt') or die "can't open the file: $!"; my $position = tell(FILE2); my %rs; while ($line=<FILE2>) { my ($key)= $line=~/^(rs\d{5,})\b/; if (defined $key) { $rs{$key}= $position; } $position=tell(FILE2); } while (defined ($line= <FILE1>)) { my@fields= split (/\s+/ ,$line); push @output, grep /^rs\d{5,}\b/ ,@fields; } print Dumper(\@output,\%rs); foreach (@output) { if (exists $rs{$_}) { seek(FILE2,$rs{$_},0); my $line= <FILE2>; print FD $line; } } close FILE1; close FILE2; close FD;

      With the data I posted, I get the following output:

      $VAR1 = [ 'rs34569384', 'rs123456', 'rs234567', 'rs753444' ]; $VAR2 = { 'rs123456' => 43, 'rs234567' => 15 };

      As you can see @output (==$VAR1) contains a lot of rs-numbers. %rs (==$VAR2) contains some of the same rs-numbers and corresponding file positions. You should see the same if you use my data.

      Now try it with your data. What is different? Are there rs-numbers that are in both @output and %rs? Do the file position numbers look correct?

        Ok done Var1 looks the same. Var2 looks the same but the value is greater like 25286112, 12001461, 21675360 associated with three different rs numbers of course...but it looks like an array
        Pardon! I should have messed up with the dummy text. Actually with real data it works! I learned a lot! Thank you! Now I need to get rid of double rs that are coming from the file 1 (where certain rs are called twice, then I got 3 lines with the same data.) I try to re write a code and then I will ask for help if I can not get out....anyway nice I have learned Data::Dumper too