perlquestion
FamousLongAgo
Hello, fellow monks!
<br><br>
I have been writing a parser for some Protein Data Bank files, for a bioinformatics project. I have no problem extracting the sequences I need, but I am stumped by the titles. Here's the problem:
<br><br>
The files start out in this format:<br>
<code>
HEADER METAL BINDING PROTEIN 31-AUG-98 1BSW
TITLE ACUTOLYSIN A FROM SNAKE VENOM OF AGKISTRODON ACUTUS AT PH
TITLE 2 7.5
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: ACUTOLYSIN A;
...
</code>
The lines beginning with <code>TITLE</code> are the ones I'm interested in grabbing. There's a little caveat in that after the first line, the line number gets prepended to the title fragment. So in this example, the actual title is "Acutolysin A from snake venom of agkistrodon acutus at pH 7.5".<br><br>
So far so dull. But later in the file, sometimes much later, there may be lines that also begin with <code>TITLE</code>. We want to ignore those.<br>
Assuming the following constraints:
<ol>
<li>We treat the file as an array ( no slurping into a scalar )</li>
<li>There is no way to distinguish the later TITLE elements by pattern matching.</li>
</ol>
Can anyone think of an elegant way to grab the first block of 1+ contiguous <code>TITLE</code> lines, and stop?
<br><br>I know how to do this with regular expressions on a scalar, and how to do it in a very unelegant way by setting flags in a loop, but I suspect there is greater wisdom out there and can't wait to learn.
<br><br>
Special bonus to anyone who can tell me what an <i>agkistrodon acutus</i> is, and how deadly is its bite.