Re: Extracting Text Using Regular Expressions Problem

update: it appears that I didn't understand all the requirements when I wrote this code. But hopefully it will help you in your endeavor. This shows how to get all of the comment blocks. From what I understand there is a single =Additional Notes= section at the very end. Make a 2nd regex along the same line of thought as below to get that section, but since it is the very last section, then terminator is not needed, eg.

m/[=]+Additional Notes[=]+.*?\n(.*)/s; #this (.*) will get all 
                                       #to end of the $page
                                       #see below, ending [=]+ and /g 
+is not
                                       #needed for this job
[download]

#!/usr/bin/perl -w
use strict;

open (IN , '<', "awebpage.txt") or die;

my @page = <IN>;           #this is like a "slurp" into a scalar
my $page = join('',@page); #with undef record seperator

my @comments = $page =~ m/[=]+Comments[=]+.*?\n(.*?)[=]+/gs;

my $count =1;
foreach (@comments)
{
   print "COMMENT #$count is:\n$_";
   $count++;
}

=file awebpage.txt is:
A webpage.

===Comments===
This webpage contains information bla bla bla

=Section 2=
Some more text here.
whatever
===Comments===
Some other comments here.
=Another section=
=Aditional Notes=
=Comments=
some more comments and notes here
=Notes=
More notes here.
=cut

=****prints:****
COMMENT #1 is:
This webpage contains information bla bla bla

COMMENT #2 is:
Some other comments here.
COMMENT #3 is:
some more comments and notes here
=cut
[download]

Comment on Re: Extracting Text Using Regular Expressions Problem Select or Download Code

Replies are listed 'Best First'.
Re^2: Extracting Text Using Regular Expressions Problem by danj35 (Sexton) on May 10, 2010 at 11:54 UTC
Thanks. That works perfectly. Glad to have put this problem to bed!	[reply]