in reply to Extracting Text Using Regular Expressions Problem

update: it appears that I didn't understand all the requirements when I wrote this code. But hopefully it will help you in your endeavor. This shows how to get all of the comment blocks. From what I understand there is a single =Additional Notes= section at the very end. Make a 2nd regex along the same line of thought as below to get that section, but since it is the very last section, then terminator is not needed, eg.
m/[=]+Additional Notes[=]+.*?\n(.*)/s; #this (.*) will get all #to end of the $page #see below, ending [=]+ and /g +is not #needed for this job

#!/usr/bin/perl -w use strict; open (IN , '<', "awebpage.txt") or die; my @page = <IN>; #this is like a "slurp" into a scalar my $page = join('',@page); #with undef record seperator my @comments = $page =~ m/[=]+Comments[=]+.*?\n(.*?)[=]+/gs; my $count =1; foreach (@comments) { print "COMMENT #$count is:\n$_"; $count++; } =file awebpage.txt is: A webpage. ===Comments=== This webpage contains information bla bla bla =Section 2= Some more text here. whatever ===Comments=== Some other comments here. =Another section= =Aditional Notes= =Comments= some more comments and notes here =Notes= More notes here. =cut =****prints:**** COMMENT #1 is: This webpage contains information bla bla bla COMMENT #2 is: Some other comments here. COMMENT #3 is: some more comments and notes here =cut

Replies are listed 'Best First'.
Re^2: Extracting Text Using Regular Expressions Problem
by danj35 (Sexton) on May 10, 2010 at 11:54 UTC

    Thanks. That works perfectly. Glad to have put this problem to bed!