in reply to homemade blat: need help with reverse string.
If I understand your problem, this program would be a solution.
Some output from a fasta file.#!/usr/bin/perl use strict; use warnings; print "Insert query:\n"; chomp(my $query = uc <STDIN>); print "\n"; my $nomefile = $ARGV[0]; open my $f, '<', $nomefile or die "Unable to open $nomefile. $!"; chomp(my $id_line = <$f>); die "Input file $nomefile not in FASTA format\n" if substr($id_line,0,1) ne ">"; my $sequenza; while (my $line = <$f>) { chomp $line; if (substr($line, 0, 1) eq '>') { report_matches($id_line, $sequenza, $query); $sequenza = ''; $id_line = $line; } else { $sequenza .= uc $line; } # necessary to report last record if eof report_matches($id_line, $sequenza, $query) if eof; } close $f or die "Unable to close $nomefile. $!"; sub report_matches { my ($id, $sequenza, $query) = @_; my %pos; my $revcomp = reverse $query =~ tr/ATCG/TAGC/r; my $len_query = length $query; for my $i (0 .. length($sequenza)-$len_query) { my $piece = substr $sequenza, $i, $len_query; if ($piece eq $query) { push @{ $pos{q} }, $i; } elsif ($piece eq $revcomp) { push @{ $pos{rc} }, $i; } } print "For id: $id\n"; if ($pos{q}) { printf "QUERY: $query\tappears %d times\t position @{ $pos{q} +}\n", scalar @{ $pos{q} }; } if ($pos{rc}) { printf "REVCOMP: $revcomp\tappears %d times\tposition @{ $pos{ +rc} }\n", scalar @{ $pos{rc} }; } print "\n"; }
C:\Old_Data\perlp>perl test3.pl fasta_dat.txt Insert query: ttttt For id: >NR_037701 1 QUERY: TTTTT appears 6 times position 615 680 1114 1374 1428 2322 REVCOMP: AAAAA appears 47 times position 1640 2010 3304 3305 3 +306 3307 3308 3309 3 310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3 +324 3325 3326 3327 3 328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3 +342 3343 3344 3345 3 346 3347 3348 For id: >NM_198399 1 QUERY: TTTTT appears 19 times position 182 908 920 1014 116 +9 1191 1192 1313 131 4 1315 1316 1317 1472 1671 1705 1706 1956 2389 2390 REVCOMP: AAAAA appears 20 times position 591 953 1051 1052 127 +5 2149 2364 2373 237 4 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 For id: >NR_026816 1 QUERY: TTTTT appears 3 times position 371 372 384 REVCOMP: AAAAA appears 3 times position 593 594 595 For id: >NR_027917 1 QUERY: TTTTT appears 1 times position 339 For id: >NR_002777 3 QUERY: TTTTT appears 2 times position 921 922 REVCOMP: AAAAA appears 25 times position 286 577 975 976 977 9 +78 979 980 981 982 9 83 984 985 986 987 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 For id: >NR_033769 1 QUERY: TTTTT appears 7 times position 75 1176 1177 1229 1230 1321 +1322 REVCOMP: AAAAA appears 6 times position 629 1590 1591 1592 1598 1609 For id: >NM_016326 3 QUERY: TTTTT appears 1 times position 368 REVCOMP: AAAAA appears 5 times position 321 337 338 339 340 For id: >NM_181641 2 QUERY: TTTTT appears 4 times position 255 256 324 527 REVCOMP: AAAAA appears 5 times position 480 496 497 498 499 For id: >NM_001144931 1 For id: >NR_029429 1 For id: >NR_026551 1 QUERY: TTTTT appears 1 times position 415 For id: >NM_181640 2 QUERY: TTTTT appears 1 times position 464 REVCOMP: AAAAA appears 5 times position 417 433 434 435 436 For id: >NM_016951 3 QUERY: TTTTT appears 4 times position 255 256 324 623 REVCOMP: AAAAA appears 5 times position 576 592 593 594 595 For id: >NR_002773 1 QUERY: TTTTT appears 1 times position 1377 REVCOMP: AAAAA appears 2 times position 1929 1937 For id: >NR_037806 1 QUERY: TTTTT appears 6 times position 693 837 838 839 840 1114 REVCOMP: AAAAA appears 3 times position 1292 1293 1322
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: homemade blat: need help with reverse string.
by nikkk (Novice) on Dec 16, 2015 at 14:58 UTC | |
by Athanasius (Archbishop) on Dec 17, 2015 at 03:08 UTC | |
by nikkk (Novice) on Dec 19, 2015 at 16:51 UTC | |
|
Re^2: homemade blat: need help with reverse string.
by nikkk (Novice) on Dec 19, 2015 at 17:05 UTC |