in reply to Finding pattern in a file
Update 2: As revealed in the crosspost linked here, the true format of the OPed example file seems to be standard FASTA, so the correct approach is along the lines used by choroba here.
Is this the correct way of doing it?
No.
You should always have the statements
use strict;
use warnings;
at the start of your code (see strict and warnings). If you place these statements at the beginning of the OPed code, what happens? Some questions:
Maybe try something like:
Note: This has not been tested with multiple FASTA records, only with the single record example given in the OP.use strict; use warnings; use autodie; use constant START_FASTA_REC => qr{ \A [^\]]+ \] }xms; use constant SUB_SEQUENCE => qr{ CDECGKEFSQGAHLQTHQKVH }xms; # use constant SUB_SEQUENCE => qr{ NOT_PRESENT }xms; # for debug MAIN { my $filename = 'file,fasta'; open my $fh_fasta, '<', $filename; my $fasta_record; LINE: while (my $line = <$fh_fasta>) { chomp $line; if ($line =~ s{ ${ \START_FASTA_REC }}''xms) { process_fasta_record($fasta_record) if defined $fasta_record and length $fasta_record; $fasta_record = $line; next LINE; } $fasta_record .= $line; } close $fh_fasta; # process final fasta record. process_fasta_record($fasta_record) if defined $fasta_record and length $fasta_record; exit; # normal exit from MAIN block } # end MAIN block die "unexpected exit from MAIN block"; # subroutines ###################################################### sub process_fasta_record { my ($fasta_record, ) = @_; print "'$fasta_record' \n"; # for debug if ($fasta_record =~ SUB_SEQUENCE) { print "The protein contains the domain"; } else { print "The protein doesn't contain the domain"; } }
Update 1: (Update: Nevermind. See Update 2 above.) I have a slightly simpler version of this script that I've tested (minimally) with a two-record FASTA file. Please let me know if you're interested.
Give a man a fish: <%-{-{-{-<
|
|---|