Well, the miRBase data format is spooky, but initially what I'll want to do is to clean up this file by removing the lines that have no interesting value to the analysis problem at hand, you can keep the original file intact and the cleaned up file(s) be generated from there and each can have their own subset of the original file and their own subproblem to be analysed that collectively culminate into achieving the overall analytical goal (N.B. You've not mentioned what you intend to do with the file sections you wanted captured).

I have to disagree with Moritz's reliance on the '*' to separate the records (this arose from the OP's description) because, these '*'s in here have a different meaning all together and they aren't record separators at all since they're used to reflect how two lines -or multiple ones for that matter- of letters are identical at the character level in that position, this is known as Sequence Alignment, so if these sequences weren't identical no '*' appears and thus two records can be inadvertently fused and if an alignment appeared mid-record then a record could be separated into two without having noticed so. On a related note you use the '-' to represent alignment gaps.

gap | v TTCCAG-CCAGCTTTGTGACT-CTA TTCCAGCCCAGCTTTATGACT-GTA TTCCAGCCCAGCTTCTTCGCT-CTG ****** ****** * * ^ | identity
Back to topic, refining the file by purging the unwanted lines can probably allow you to use one of the BioPerl modules to tackle the entire problem without writing much code after all and can enable us to see a clear definition thereof in order to relevantly provide assistance.

You may want to read Perl and Bioinformatics in addition.


Excellence is an Endeavor of Persistence. A Year-Old Monk :D .

In reply to Re: regular expression questions (from someone without experience) by biohisham
in thread regular expression questions (from someone without experience) by gogoglou

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.