Good day Monks!

I have a minor problem with a regular expression I'm using that needs tweeking. What I have is a variable of a page of text with lots of different paragraphs. I need to take chunks of that text and store it to separate variables. The variable looks like this (the content is not so important, it's mainly the headings that are):

=Title of Page= A general introduction to the topic. =Literature= Information here refers to the literature surrounding the topic. ===Comments=== User comments are added here. A user may write whatever they may wish. =Microarray Data= Information surrounding the topic related to microarrays. ===Comments=== Comments are related to the microarray data here. =Pathway Information= Information related to pathways for the topic is found here. ===Comments=== Comments related to the pathway information here. =Aditional Info= Any additional information can be found here.

I am trying to store all the text below each comments (===Comments===) header to separate variables. So what this means is that I should have 3 new variables, one for each comments header. The problem I have had is being able to use regular expressions to take this information, as each Comments header I would use as the start tag is exactly the same (===Comments===). I had solved this by using the following regular expression:

# $page refers to the variable containing the text my @comments = Dumper($page) =~ m/[=]+Comments[=]+.*?\n(.*?)[=]+/gs;

I have run into a problem with this now, in that if an '=' is encountered before the end of the specified comments section text after this is no longer stored. I assume the way to solve this would be to write 3 separate statements for each comments section, differentiated by their relevant end tags (=Microarray Data=; =Pathway Information= and =Aditional Info=).

Any help with this would be great, as I'm close to having this finished and I'm excited to see it working :)

Cheers.

P.s. I should mention that the comments sections do vary and they can be as long as you like. There may be numerous newline characters here.


In reply to Regular Expressions Challenge by danj35

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.