Re: search and print in perl
by CountZero (Bishop) on Jun 01, 2009 at 11:53 UTC
|
For a start: put use strict; use warnings; at the beginning of each and every perl-script you write. It will save you much time and annoyances;Next, don't use global variables if you can avoid it. Lexical variables (those that start with "my" are the way forward. The open operator is best used with lexical variables and with 3 arguments: my $filename = 'input.txt';
open (my $IN, '<', $filename) or die "Can't open file $filename : $! "
+;
(BTW: the text input.txt does not need to be interpolated, so put it in single quotes.Now you have to start reading in the contents of the file and much will depend on the format of your input.txt-file and I'm not sure that it is a good idea to concatenate the whole file into one scalar while still keeping the "End-Of-Line" characters. Perhaps you can show us a small excerpt of your input.txt-file? If you do that we can continue helping you.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] [select] |
|
|
Is each "line" (defined as anything which is ended by \n) one gene? Or do you have to combine multiple lines to "make-up" one gene?
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] [d/l] |
|
|
No, A gene is like this: it is preceded by a string TATAAT and after this string there can be one or many strings of letters A,C,G,T . then ATG string follows them, then again random amount of A,C,G,T's follow it and the gene ends with one of the strings TAA, TGA or TAG. for example a line is TATAATATTACAATGGATCATACAGTTAG ... our gene is the part between ATG and TAG (ATGGATCATACAGTTAG here) but we also have to make sure it is preceded by a TATAAT.. I have to print out the genes in the txt file according to these rules.
| [reply] |
|
|
|
|
Thanks. My input.txt is like the following:
TGGTACGACCGAACGAAAGAAAAAGAACACACACTGACCGGAGGGGTTGAATTGTTTGCCTGGCAC
it goes on like this, Random ACGT characters for whole 6 lines.
| [reply] |
|
|
Could you provide a sample of a subsequence that is supposed to match (there is no TATAAT in this sample)? Also, may the subsequences in question be split across more than one line? In this case, you would need to chomp the input lines to get rid of the newlines in your $text (as you're not accounting for them in the regex).
Also, if your input is then all on one line, you probably want non-greedy matching ([ACGT]+?) of the parts in between the strings of interest, or else they'll gobble up more than you want...
| [reply] [d/l] [select] |
|
|
while($line = <IN>)
{
chomp $line; ## <---- !!!
$text .= $line;
}
| [reply] [d/l] |
Re: search and print in perl
by Bloodnok (Vicar) on Jun 01, 2009 at 11:59 UTC
|
...working on a homework... - $honesty++.
Some sample data, outlining both matching and non-matching cases, would help - IMO, your description doesn't quite cut it for me - I'm barely a proficient programmer, I'm most certainly not a geneticist e.g. in this case, is a gene represented by a single char, a sequence of chars, or both?
That being said...
1. This:
.
.
$text = "";
while($line = <IN>)
{
$text .= $line;
}
.
.
is more usually (and in most cases, better) written as...
local $/; # Ensure line-ends are ignored
$text = <IN>;
.
.
2. AFAICT i.e. subject to further details being provided, your RE appears to only capture start & end delimiters.
A user level that continues to overstate my experience :-))
| [reply] [d/l] [select] |
|
|
Thanks. A gene is like this: it is preceded by a string TATAAT and after this string there can be one or many strings of letters A,C,G,T . then ATG string follows them, then again random amount of A,C,G,T's follow it and the gene ends with one of the strings TAA, TGA or TAG. for example a gene is
TATAATATTACAATGGATCATACAGTTAG ... our gene is the part between ATG and TAG but we also have to make sure it is preceded by a TATAAT.. I have to print out the genes in the txt file according to these rules.
| [reply] |
|
|
Assuming a definition per line i.e. not split over multiple lines, then...
use warnings;
use strict;
local $/;
my $data = <DATA>;
while ($data =~ /TATAAT[ACGT]+ATG([ACGT]+)(:?T(:?GA|AA|AG))/cgs) {
warn $1;
}
__DATA__
TATAATATTACAATGGATCATACAGTTAG
TATAATATTACAATGGATCATACAGTTAG
TATAATATT
ACAATGGATCATACAGTTAG
produces:
$ perl tst.pl
GATCATACAGT at tst.pl line 8, <DATA> chunk 1.
GATCATACAGT at tst.pl line 8, <DATA> chunk 1.
$
A user level that continues to overstate my experience :-))
| [reply] [d/l] [select] |
Re: search and print in perl
by wol (Hermit) on Jun 01, 2009 at 14:13 UTC
|
Homework.
Genetic engineering.
Disturbing?
--
use JAPH;
print JAPH::asString();
| [reply] |