in reply to look for substrings and getting their location

It seems that a number of the posts took the original question and changed it somewhat, consequently, not giving full and thorough solutions. For instance, the original question states that the data are in the following format:

YBL027W
GUAUGUUUAACAGU...

Yet, a couple of the solutions begin by setting

$var = 'GUAUGUUUAACAGU...'
How does one get the line name from the solution above? A solution which leaves the data in the original format and gives the line name, number of matches, and their zero-based offsets is as follows:
#!/usr/bin/perl use warnings; use strict; my $pat = 'GUAUG'; my ($line, $times, @at); while (<DATA>) { if (/^[CGUA]+$/) { $times = () = m/$pat/g; if ($times) { eval('/^' . ('.*?($pat)' x $times) . '.*?$/; @at = @-;'); shift @at; } } else { ($line) = /^(\w+)$/; } if ($line and $times) { print "$line: $times match", $times>1 ? 'es' : ' ', " at @at\n"; $line = $times = 0; } } __DATA__ YBL027W GUAUGUUUAACAGUGAUAGUAUGUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGA BBL111C UAUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAGUAUGGUAUGAAUAUGUUAUGAG ABC456T AUGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGAGU DEF789U UGUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGAGUA GHI012V GUUUAACAGUGAUACUAAAUUUUGAACCUUUCACAAGAUUUAUCUUUAAAUAUGUUAUGUAUGU
Perl was created to manipulate text. A solution to a problem such as this should be compact and easy to understand.

I made a few assumptions:
• All DNA sequences comprise CGUA. (I thought it was CGAT. I am not a scientist but I play one on TV.)
• The search strings do NOT overlap.
• The line name has at least one character that is not C, G, U, or A.
• All lines alternate between line name and DNA sequence with the former before the latter.