Good day Monks!
I have a minor problem with a regular expression I'm using that needs tweeking. What I have is a variable of a page of text with lots of different paragraphs. I need to take chunks of that text and store it to separate variables. The variable looks like this (the content is not so important, it's mainly the headings that are):
=Title of Page= A general introduction to the topic. =Literature= Information here refers to the literature surrounding the topic. ===Comments=== User comments are added here. A user may write whatever they may wish. =Microarray Data= Information surrounding the topic related to microarrays. ===Comments=== Comments are related to the microarray data here. =Pathway Information= Information related to pathways for the topic is found here. ===Comments=== Comments related to the pathway information here. =Aditional Info= Any additional information can be found here.
I am trying to store all the text below each comments (===Comments===) header to separate variables. So what this means is that I should have 3 new variables, one for each comments header. The problem I have had is being able to use regular expressions to take this information, as each Comments header I would use as the start tag is exactly the same (===Comments===). I had solved this by using the following regular expression:
# $page refers to the variable containing the text my @comments = Dumper($page) =~ m/[=]+Comments[=]+.*?\n(.*?)[=]+/gs;
I have run into a problem with this now, in that if an '=' is encountered before the end of the specified comments section text after this is no longer stored. I assume the way to solve this would be to write 3 separate statements for each comments section, differentiated by their relevant end tags (=Microarray Data=; =Pathway Information= and =Aditional Info=).
Any help with this would be great, as I'm close to having this finished and I'm excited to see it working :)
Cheers.
P.s. I should mention that the comments sections do vary and they can be as long as you like. There may be numerous newline characters here.
In reply to Regular Expressions Challenge by danj35
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |