danj35 has asked for the wisdom of the Perl Monks concerning the following question:
Good day Monks!
I have a minor problem with a regular expression I'm using that needs tweeking. What I have is a variable of a page of text with lots of different paragraphs. I need to take chunks of that text and store it to separate variables. The variable looks like this (the content is not so important, it's mainly the headings that are):
=Title of Page= A general introduction to the topic. =Literature= Information here refers to the literature surrounding the topic. ===Comments=== User comments are added here. A user may write whatever they may wish. =Microarray Data= Information surrounding the topic related to microarrays. ===Comments=== Comments are related to the microarray data here. =Pathway Information= Information related to pathways for the topic is found here. ===Comments=== Comments related to the pathway information here. =Aditional Info= Any additional information can be found here.
I am trying to store all the text below each comments (===Comments===) header to separate variables. So what this means is that I should have 3 new variables, one for each comments header. The problem I have had is being able to use regular expressions to take this information, as each Comments header I would use as the start tag is exactly the same (===Comments===). I had solved this by using the following regular expression:
# $page refers to the variable containing the text my @comments = Dumper($page) =~ m/[=]+Comments[=]+.*?\n(.*?)[=]+/gs;
I have run into a problem with this now, in that if an '=' is encountered before the end of the specified comments section text after this is no longer stored. I assume the way to solve this would be to write 3 separate statements for each comments section, differentiated by their relevant end tags (=Microarray Data=; =Pathway Information= and =Aditional Info=).
Any help with this would be great, as I'm close to having this finished and I'm excited to see it working :)
Cheers.
P.s. I should mention that the comments sections do vary and they can be as long as you like. There may be numerous newline characters here.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regular Expressions Challenge
by moritz (Cardinal) on May 18, 2010 at 07:10 UTC | |
by danj35 (Sexton) on May 18, 2010 at 07:15 UTC | |
by moritz (Cardinal) on May 18, 2010 at 08:36 UTC | |
|
Re: Regular Expressions Challenge
by JavaFan (Canon) on May 18, 2010 at 07:29 UTC | |
by danj35 (Sexton) on May 18, 2010 at 07:41 UTC | |
by JavaFan (Canon) on May 18, 2010 at 13:33 UTC | |
|
Re: Regular Expressions Challenge
by danj35 (Sexton) on May 18, 2010 at 08:25 UTC | |
by ig (Vicar) on May 18, 2010 at 10:00 UTC | |
|
Re: Regular Expressions Challenge
by LanX (Saint) on May 18, 2010 at 09:51 UTC | |
by cdarke (Prior) on May 18, 2010 at 11:30 UTC | |
by LanX (Saint) on May 18, 2010 at 11:08 UTC | |
|
Re: Regular Expressions Challenge
by danj35 (Sexton) on May 18, 2010 at 11:42 UTC | |
|
Re: Regular Expressions Challenge
by dineed (Scribe) on May 18, 2010 at 17:25 UTC |