sesemin has asked for the wisdom of the Perl Monks concerning the following question:
I think this is a unique situation. I tried it with hash but I think because the keys of hash are unique I cannot implement if with hash. Then I switched to array of array which I think it will work. Obviously my code does not work but want to show one way that it can be implemented.
here is the problem and examples.
1- Read column one of File1
2- Split the string in that column to its components (gene name-startbase_endbase)
3- We will have genename, startbase, end base.
4- Put it into an array of array
5- Read column one of File2 6- Do the same thing as 2-4
7- Query through array of array of file1 and find common elements in file2 that:
a. First match genename if they matched then check if
b. Start position of the matched genename in file1 falls between start and end position of the same genename in file2
File1: CLS_S3_Contig2721-139_168 CLS_S3_Contig2722-375_390 CLS_S3_Contig2725-323_362 CLS_S3_Contig2725-455_480 CLS_S3_Contig2728-117_144 CLS_S3_Contig2728-437_472 CLS_S3_Contig2729-119_130 CLS_S3_Contig2729-163_220 CLS_S3_Contig2730-181_202 CLS_S3_Contig2730-361_384 CLS_S3_Contig2731-824_843 CLS_S3_Contig2731-1150_1201 CLS_S3_Contig2735-571_636 CLS_S3_Contig2735-677_710 CLS_S3_Contig2735-775_810 . . .
File2 CLS_S3_Contig2721-142_169 CLS_S3_Contig6525-509_514 CLS_S3_Contig6525-493_502 CLS_S3_Contig6525-503_508 CLS_S3_Contig2977-365_376 CLS_S3_Contig2977-77_82 CLS_S3_Contig2977-83_90 CLS_S3_Contig4978-271_274 CLS_S3_Contig4978-385_388 CLS_S3_Contig2730-365_389 . . .
Output: Genename(file1) start end ** Genename(file2) start end CLS_S3_Contig2721 139 168 ** CLS_S3_Contig2721 142 169 CLS_S3_Contig2730 361 384 ** CLS_S3_Contig2730 365 389 . .
while(<INPUT1>){ chomp; my @id = split /\t/; if ($id[0] =~ /(.+?)\-(\d+?)_(\d+)/) { my @line_map = ("$1", $2, $3); push @file_map, [@line_map]; } } close(INPUT1); while(<INPUT2>){ chomp; my @map_id = split /\t/; if ($tg_id[0] =~ /(.+?)\-(\d+?)_(\d+)/) { my @tg_id = ("$1", $2, $3); push @file_tg, [@tg_id]; } } if (($from_tg == $from_map) && ($to_tg == $to_map)){ print join("\t",$two_geno_id, $from_map,$to_map,"<-Ma +pside**TGside->",$two_geno_id, $from_tg, $to_tg, $from_map_tg_range, +$to_map_tg_range),"\n"; $lines_1++; } elsif (($from_tg < $to_map) && ($from_tg > $from_map)){ print join("\t",$two_geno_id, $from_map,$to_map,"<-Ma +pside**TGside->",$two_geno_id, $from_tg, $to_tg, $from_map_tg_range, +$to_map_tg_range),"\n"; $lines_9++; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Query through array of array
by puudeli (Pilgrim) on Jan 22, 2009 at 10:12 UTC | |
by sesemin (Beadle) on Jan 22, 2009 at 22:03 UTC | |
by puudeli (Pilgrim) on Jan 23, 2009 at 06:43 UTC | |
by sesemin (Beadle) on Jan 23, 2009 at 07:42 UTC | |
by puudeli (Pilgrim) on Jan 23, 2009 at 08:15 UTC | |
|
Re: Qurey through array of array
by rovf (Priest) on Jan 22, 2009 at 13:08 UTC | |
by sesemin (Beadle) on Jan 22, 2009 at 21:49 UTC | |
|
Re: Qurey through array of array
by brsaravan (Scribe) on Jan 22, 2009 at 13:07 UTC | |
by sesemin (Beadle) on Jan 22, 2009 at 22:36 UTC | |
by brsaravan (Scribe) on Jan 23, 2009 at 05:25 UTC | |
by sesemin (Beadle) on Jan 23, 2009 at 06:49 UTC |