If production is your goal,
frozenwithjoy's suggestion may be just what you need.
If this is (as it appears) schoolwork (homework?), then you need to understand that this is NOT 'code-a-matic.'
We'll be pleased to help you learn; you need merely show that you've made a good faith effort to solve your problem. In this case, means, post your code and tell us how it fails or post an algorithm (or pseudo-code) where you can't work out the syntax.
You've outlined a fairly ambitious project for a 'complete newbie in perl,' so -- in case you're stuck on which of Perl's capabilities will help you here, consider
- Do you know how to get the data from the "main file" into a script? If not, perldoc -f while will be helpful. Hint: given the size of your data file, you'll probably want to do so, line by line.
- Consider pushing each line into a temporary cache as it's read; then test to see if it satisfies some criterion for being line 2. If not, read the next line, and see if it's line 2.
If so, test its first 9 chars with regexen (lottsa' reading here: perldoc perlretut and company) and if those match the characteristics for replicate 1, 2 or 3, stashthe cached line, the line with the match and the next two lines (Hint: set a flag when you find any match for any target replicate and use it and the ++ [perldoc -f increment increment operator to know when you've pushed all four lines of the record) into the approrpiate array, say, @rep1, @rep2 or @rep3.
- wash, rinse, repeat...
My suspicion is that working out an appropriate set of regular expressions (there's a broad hint in the word "set" and a part of one of many possible solutions next) will be your biggest challenge, so...
my (@rep1,@rep2,@rep3);
my $prefix = qr/[ACTG]{3}/;
my $rep1 = qr/TTGT/;
my $rep2 = qr/GGTT/;
my $rep3 = qr/ACCT/;
my $postfix = qr/[ACTG]{2}/;
while (my $line = <DATA>) {
if ($line =~ /^$prefix
$rep1
$postfix/x )
{
push @rep1, $line; # ignoring, for regex instruction,
# the need to push your cached line, etc..
+.
}
elsif ($line =~ /^$prefix
$rep2
$postfix/x )
{
....
There may be a way around this line-by-line approach. If you can absolutely count on "+" as the entire content of the third line of each record, you could use that fact as part of an approach to reading your "main file" record-by-record -- but that would be an additional complexity. Your addendum does, however, suggest an approach.
So, my suggestion is -- try this, if you're working on homework... and come back when you get stuck, with code, and details about the shortcomings of that code
And BTW, welcome to the Monastery.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.