Update 2: As revealed in the crosspost linked here, the true format of the OPed example file seems to be standard FASTA, so the correct approach is along the lines used by choroba here.


Is this the correct way of doing it?

No.

You should always have the statements
    use strict;
    use warnings;
at the start of your code (see strict and warnings). If you place these statements at the beginning of the OPed code, what happens? Some questions:

Please see Basic debugging checklist.

Maybe try something like:

use strict; use warnings; use autodie; use constant START_FASTA_REC => qr{ \A [^\]]+ \] }xms; use constant SUB_SEQUENCE => qr{ CDECGKEFSQGAHLQTHQKVH }xms; # use constant SUB_SEQUENCE => qr{ NOT_PRESENT }xms; # for debug MAIN { my $filename = 'file,fasta'; open my $fh_fasta, '<', $filename; my $fasta_record; LINE: while (my $line = <$fh_fasta>) { chomp $line; if ($line =~ s{ ${ \START_FASTA_REC }}''xms) { process_fasta_record($fasta_record) if defined $fasta_record and length $fasta_record; $fasta_record = $line; next LINE; } $fasta_record .= $line; } close $fh_fasta; # process final fasta record. process_fasta_record($fasta_record) if defined $fasta_record and length $fasta_record; exit; # normal exit from MAIN block } # end MAIN block die "unexpected exit from MAIN block"; # subroutines ###################################################### sub process_fasta_record { my ($fasta_record, ) = @_; print "'$fasta_record' \n"; # for debug if ($fasta_record =~ SUB_SEQUENCE) { print "The protein contains the domain"; } else { print "The protein doesn't contain the domain"; } }
Note: This has not been tested with multiple FASTA records, only with the single record example given in the OP.

Update 1: (Update: Nevermind. See Update 2 above.) I have a slightly simpler version of this script that I've tested (minimally) with a two-record FASTA file. Please let me know if you're interested.


Give a man a fish:  <%-{-{-{-<


In reply to Re: Finding pattern in a file (updated x2) by AnomalousMonk
in thread Finding pattern in a file by shabird

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.