Pratyusha Reddy has asked for the wisdom of the Perl Monks concerning the following question:

Hello

This is code that i had written to find the occurance of a word eg:ACGT from a file but in the output $count returning '0' only. where is the mistake in the code and why is it returning '0' in the output.

#!/usr/bin/perl -w print "Please type the filename of the Chromosome: "; $chromosomefilename = <STDIN>; chomp $chromosomefilename; unless ( open(CHROMOSOMEFILE, $chromosomefilename) ) { print "Cannot open file \"$chromosomefilename\"\n\n"; exit; } @chromosome = <CHROMOSOMEFILE>; close CHROMOSOMEFILE; $chromosome = join( '', @chromosome); $chromosome =~ s/\s//g; do { print "Enter the Recognition Sequence to search for: "; $recognitionsequence = <STDIN>; chomp $recognitionsequence; my @words = qw ($chromosomefilename); my $count = grep /$recognitionsequence/, @words; print "\$count = $count\n"; } until ( $recognitionsequence =~ /^\s*$/ ); exit;

Replies are listed 'Best First'.
Re: Count the occurance of a words in a string
by moritz (Cardinal) on Feb 21, 2012 at 11:23 UTC
    my @words = qw ($chromosomefilename); my $count = grep /$recognitionsequence/, @words;

    So you're counting if $recognitionsequence matches the string $chromosomefilename (the literal string, not the contents of that variable). Is that what you want? If not, take it as a clue as to where start searching for errors.

Re: Count the occurance of a words in a string
by JavaFan (Canon) on Feb 21, 2012 at 11:26 UTC
    qw ($chromosomefilename); doesn't do what you think it does. There's no interpolation inside qw. Furthermore, do you really want to search in the filename? Perhaps you want to search in the filecontent, but since you have removed all whitespace, I wonder what your intend of the qw is.

    If you just want to know whether ACGT is present, just use the grep or ack utility.

Re: Count the occurance of a words in a string
by CountZero (Bishop) on Feb 21, 2012 at 21:44 UTC
    What do you think my @words = qw ($chromosomefilename); does? I am pretty sure it is not doing what you think it should do.

    The following contains a subroutine that takes a filename and a sequence to search for and returns the number of matches found.

    use Modern::Perl; use autodie; sub count_seq { my ($file, $search_seq) = @_; my $sequence; { local $/; # slurp mode open my $fh, '<', $file; $sequence = <$fh>; } $sequence =~ s/[\s\n]//g; return scalar (()=$sequence=~ m/$search_seq/g); } say count_seq('test.file', 'TA');
    The subroutine builds one big string and eliminates all spaces and EOL characters. I am not sure that is a good idea, but I guess it will depend on your fileformat. By putting everything in one big string you could inadvertently construct not existing sequences. Suppose one sequence ends with "AT" and the next one starts with "CG". If you are looking for "ATCG", you will now find this combination in the end-of-a-sequence + beginning-of-another-sequence combination.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics