in reply to Re^10: how to read input from a file, one section at a time?
in thread how to read input from a file, one section at a time?

Try

my %fasta_seen; FASTA_RECORD: while ( my $para = <$PROTFILE> ) { # Remove fasta header line if ( $para =~ s/^>(.*)//m ){ $name = $1; }; # Remove comment line(s) $para =~ s/^\s*#.*//mg; next FASTA_RECORD if $fasta_seen{ $para }++; …

This may not be a sensible solution if your sequences are very long in which case consider using a message digest like Digest::MD5

poj

Replies are listed 'Best First'.
Re^12: how to read input from a file, one section at a time?
by davi54 (Sexton) on Apr 02, 2019 at 16:02 UTC
    And how do I print the duplicate entries?
      #next FASTA_RECORD if $fasta_seen{ $para }++; if ( $fasta_seen{ $para }++ ){ print "DUPLICATE : $name \n $para\n"; next FASTA_RECORD; }
      poj
        Hi Poj,
        Can I get your email ID if that's okay with you? I can send you the complete script and you can help me with my struggle.
        Hi Poj,
        When I made the modifications that you suggested, for some reason, it mucked up the entire script. It started giving me errors like:
        Use of uninitialized value $lowest in printf at poj_repeat.pl line 60, <$PROTFILE> chunk 2.
        Use of uninitialized value $lowest in numeric eq (==) at poj_repeat.pl line 61, <$PROTFILE> chunk 2.
        Use of uninitialized value in numeric eq (==) at poj_repeat.pl line 61, <$PROTFILE> chunk 2.
        Can you help me with this? Use of uninitialized value in concatenation (.) or string at poj_repeat.pl line 62, <$PROTFILE> chunk 2.
        Missing argument in printf at poj_repeat.pl line 67, <$PROTFILE> chunk 2.
        Missing argument in printf at poj_repeat.pl line 67, <$PROTFILE> chunk 2