danj35 has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks,
I've tried asking this before, but have yet to have a reply that works. Have read so much documentation on regular expressions now that I think I'm going crazy! Seems like a simple problem to solve to me, so I'll try and be as clear as possible. Here goes:
I have taken a webpage to a variable and I need to extract various paragraphs of text from it. I have a working line of code that extracts text from the following article:
A webpage.
===Comments===
This webpage contains information bla bla bla
=Section 2=
Some more text here.
===Comments===
Some other comments here.
=Another section=
=Aditional Notes=
More notes here.
The code that I currently have extracts all the info between "===Comments===" and "Section 2=", as so:
if(Dumper($page) =~ /===Comments===(.*?)=Section 2=/s ) { $lit_comments = $1; }
What I can't seem to do now is extract the next block of text below the second comments box between "===Comments===" and "=Another Section=", as the start tag is already found earlier in the article.
As a secondary point. I also need to extract all the text after the "=Aditional Notes=" section. The problem is I do not know what the end tag for this will be, as it will be the last word used here (i.e. the last character in the webpage).
I hope this is clear. Any help would be great. Cheers!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extracting Text Using Regular Expressions Problem
by kennethk (Abbot) on May 07, 2010 at 15:12 UTC | |
|
Re: Extracting Text Using Regular Expressions Problem
by JavaFan (Canon) on May 07, 2010 at 15:12 UTC | |
|
Re: Extracting Text Using Regular Expressions Problem
by Marshall (Canon) on May 07, 2010 at 15:56 UTC | |
by danj35 (Sexton) on May 10, 2010 at 11:54 UTC | |
|
Re: Extracting Text Using Regular Expressions Problem
by ww (Archbishop) on May 07, 2010 at 20:21 UTC |