Re^6: General program and related problems

Replies are listed 'Best First'.
Re^7: General program and related problems by jethro (Monsignor) on Aug 06, 2009 at 08:16 UTC
my testdata was like this: `snp.txt: ----------- rs34569384 rs123456 rs234567 rs753444 ---------- chr22.txt: --------- bla sijghs bla rs234567 yes,first one fdjg rs123456 yes, second one ---------` [download]	[reply] [d/l]
Re^8: General program and related problems by micky744monk (Novice) on Aug 06, 2009 at 08:45 UTC
A part for the lenght of the row in file 2 , I do not see any difference. I guess you used tab to split the rs234567 with the others, but since we are looking at the beginning I do not think it makes a big deal. On the other hand my snp.txt doesn't look like that, but it is a collection of rs separated by a space with print "@output" or just a sequence of rs if print @output. So my file1 looks like: rs234567rs265897rs2458796rs2658974rs... Should I split the rs with /n then print a file 1 and then start from there? In this case I will have a file that looks like yours but that will be a collection of line, no an array	[reply]
Re^9: General program and related problems by jethro (Monsignor) on Aug 06, 2009 at 10:40 UTC
??? Your first paragraph makes no sense. You can't tell what snp.txt looks like if you print @output. The contents of @output is processed, changed by a split and a grep. Use an editor or 'less' to look at a file. You might try out the following (this is the same program, just added a Dumper-line before the foreach: #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use strict; my$line; my@fields; my@output; open (FILE1, 'snp.txt') or die "can't open the file: $!"; open (FILE2,'chr22.txt') or die "can't open the file: $!"; open (FD, '>test.txt') or die "can't open the file: $!"; my $position = tell(FILE2); my %rs; while ($line=<FILE2>) { my ($key)= $line=~/^(rs\d{5,})\b/; if (defined $key) { $rs{$key}= $position; } $position=tell(FILE2); } while (defined ($line= <FILE1>)) { my@fields= split (/\s+/ ,$line); push @output, grep /^rs\d{5,}\b/ ,@fields; } print Dumper(\@output,\%rs); foreach (@output) { if (exists $rs{$_}) { seek(FILE2,$rs{$_},0); my $line= <FILE2>; print FD $line; } } close FILE1; close FILE2; close FD; [download] With the data I posted, I get the following output: `$VAR1 = [ 'rs34569384', 'rs123456', 'rs234567', 'rs753444' ]; $VAR2 = { 'rs123456' => 43, 'rs234567' => 15 };` [download] As you can see @output (==$VAR1) contains a lot of rs-numbers. %rs (==$VAR2) contains some of the same rs-numbers and corresponding file positions. You should see the same if you use my data. Now try it with your data. What is different? Are there rs-numbers that are in both @output and %rs? Do the file position numbers look correct?	[reply] [d/l] [select]
Re^10: General program and related problems by micky744monk (Novice) on Aug 06, 2009 at 11:44 UTC
Re^10: General program and related problems by micky744monk (Novice) on Aug 06, 2009 at 12:15 UTC