Defining substring matches

drhicks has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Defining substring matches by BrowserUk (Patriarch) on Sep 21, 2013 at 00:42 UTC
I wish to use the substr() function to search for the particular motif. # substr is not designed for (nor capable of) searching for anything; so why are you specifying that particular function? You've defined your IUPAC codes in terms of regex character classes; so why are you eschewing the regex engine? Given your table, it is trivial to convert IUPAC codes into a regex and use the regex engine to search your fasta file: my %IUPAC = ( A => '[A]', C => '[C]', G => '[G]', T => '[T]', R => '[AG]', Y => '[CT]', M => '[AC]', K => '[GT]', W => '[AT]', S => '[GC]', B => '[CGT]', D => '[AGT]', H => '[ACT]', V => '[ACG]', N => '[ACGT]', ); my( $file, $motif ) = @ARGV; my $re = join '', map $IUPAC{ $_ }, split '', $motif; open FASTA, '<', $file or die $!; getc( FASTA ); ## discard first '>' until( eof( FASTA ) ) { chomp( my $id = <FASTA> ); ## read ident my $seq = do{ local $/ = '>'; <FASTA> }; $seq =~ tr[\n>][]d; while( $seq =~ m[($re)]g ) { printf "Found: '$1' at '$id':%d\n", $-[0]; } } [download] NB: The above is untested code typed directly into my browser. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: Defining substring matches by jwkrahn (Abbot) on Sep 21, 2013 at 01:24 UTC
`until( eof( FASTA) {` [download] You have two left parentheses but only one right parenthesis. `$seq =~ tr[\n][];` [download] You are replacing the newline character with a newline character?	[reply] [d/l] [select]
Re^3: Defining substring matches by Anonymous Monk on Sep 21, 2013 at 02:19 UTC
You are replacing the newline character with a newline character? Indeed, to count newline characters, but not save the count? Hmmm:)	[reply]
Re^3: Defining substring matches by BrowserUk (Patriarch) on Sep 21, 2013 at 02:38 UTC
See above: NB: The above is untested code typed directly into my browser. Corrected. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Defining substring matches by AnomalousMonk (Archbishop) on Sep 21, 2013 at 14:32 UTC
`while( $seq =~ m[($re)]g ) { printf "Found: '$1' at '$id':%d\n", $-[0]; }` [download] A question of idle (and perhaps rather trivial) curiosity: In the quoted code, you use `$-[0]` "offset of the start of the last successful match" (see `@-` in perlvar). My reflexive choice would have been `$-[1]` since capture group 1 is being matched. There's no difference in the behavior of the code since capture group 1 is all that's matched, but was there a particular reason you chose as you did?	[reply] [d/l] [select]
Re^3: Defining substring matches by BrowserUk (Patriarch) on Sep 21, 2013 at 15:22 UTC
Um. I cannot remember ever using indexes above 0 with @- or @+. I'm sure I probably have at some point, but I don't remember doing so. When I first typed it, I didn't use capture brackets and only printed the id and position. I added the capture brackets and '$1' as an afterthought. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^2: Defining substring matches by drhicks (Novice) on Sep 21, 2013 at 15:40 UTC
This code works perfect for the job. It is so much simpler and does more than the script I had written using substr(). Thanks Derrick	[reply]