comment on

It seems that a number of the posts took the original question and changed it somewhat, consequently, not giving full and thorough solutions. For instance, the original question states that the data are in the following format:

YBL027W
GUAUGUUUAACAGU...

Yet, a couple of the solutions begin by setting

$var = 'GUAUGUUUAACAGU...'

How does one get the line name from the solution above? A solution which leaves the data in the original format and gives the line name, number of matches, and their zero-based offsets is as follows:

#!/usr/bin/perl
use warnings;
use strict;
 
my $pat = 'GUAUG';
my ($line, $times, @at);
 
while (<DATA>) {
  if (/^[CGUA]+$/) {
    $times = () = m/$pat/g; 
    if ($times) {
      eval('/^' . ('.*?($pat)' x $times) . '.*?$/; @at = @-;');
      shift @at;
    }
  } else {
    ($line) = /^(\w+)$/;
  }
 
  if ($line and $times) {
    print "$line: $times match", $times>1 ? 'es' : '  ', " at @at\n";
    $line = $times = 0;
  }
}
 
__DATA__
YBL027W
GUAUGUUUAACAGUGAUAGUAUGUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA
BBL111C
UAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAGUAUGGUAUGAAUAUGUUAUGAG
ABC456T
AUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGAGU
DEF789U
UGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGAGUA
GHI012V
GUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGUAUGU
[download]

Perl was created to manipulate text. A solution to a problem such as this should be compact and easy to understand.

I made a few assumptions:
• All DNA sequences comprise CGUA. (I thought it was CGAT. I am not a scientist but I play one on TV.)
• The search strings do NOT overlap.
• The line name has at least one character that is not C, G, U, or A.
• All lines alternate between line name and DNA sequence with the former before the latter.

In reply to Re: look for substrings and getting their location by Anonymous Monk
in thread look for substrings and getting their location by wolffm

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.