(jeffa) Re: regex in html

You are close, first thing - you have to undefine the input record separator if you wish to slurp up entire blocks of lines, otherwise you will only get data up to the first new line encountered:

undef $/;
$current = <DATA>;
$start   = '<!---CURCON-->';
$end     = '<!---END CURCON-->';

my ($match) = $current =~ m/$start(.*)$end/s;
print $match;

__DATA__
stuff i don't want
<!---CURCON-->
stuff i do want
<!---END CURCON-->
more stuff i don't want
[download]

You are correcly using the 's' modifier for your regex, but instead of using s///, use m// and capture $1 in another variable. The trick is, you have to catch $1 in array context:

my ($match) = $current =~ m/$start(.*)$end/s; # note the parens around
+ $match
[download]

else $match will be equal to the number of matches found.

Now $match will contain a newline at the beginning as well as one at the end:

$match =~ tr/\n//d;
# or
$match =~ s/\n//g;
[download]

Jeff

R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
L-L--L-L--L-L--L-L--L-L--L-L--L-L--

Comment on (jeffa) Re: regex in html Select or Download Code

Replies are listed 'Best First'.
Re: regex in html by cLive ;-) (Prior) on Apr 02, 2001 at 04:33 UTC
timtowtdi... But also, I think Jeff misunderstands how your data is coming in, I'm assuming you're opening a file b4 the code you listed, and not using the __DATA__ token in your script. I don't like redefining $/, especially shown by Jeff, because it's not local and may cause issues later in your program. If you insist, use: `# assuming DATA pipe opened for reading... # declare my $current; # begin local code block { # locally define $/ local $/ = undef; # slurp $current = <DATA>; # end local code block }` [download] For more on $/, see '6.7. Reading Records with a Pattern Seperator' in The Perl Cookbook. But I'd do it this way, anyway... `# open open (DATA,"/path/to/webpage.htm") \|\| die "Can't open page - $!"; # slurp $current = join '', (<DATA>); # close close(DATA); # match $current =~ /<!---CURCON-->\n(.*?)\n<!---CURCON-->s; # store my $match = $1;` [download] Jeff's match also grabs an extra \n at beginning and end which you may not need (small point :) hope this makes sense. cLive ;-)	[reply] [d/l] [select]
(jeffa) Re: Re: regex in html by jeffa (Bishop) on Apr 02, 2001 at 17:29 UTC
I think Jeff misunderstands how your data is coming in Nope. You said it right the first time: TIMTOWDTI ;) I mentioned the extra new-lines, I did not address them because I did not know EXACTLY how the data will look EVERY time - what if there are multiple blank lines? `my ($match) = $current =~ m/$start\s(.)\s*$end/s;` [download] But thanks for sharing comments and critisicms, don't get me wrong, ++cLive ;-) :) Jeff R-R-R--R-R-R--R-R-R--R-R-R--R-R-R-- L-L--L-L--L-L--L-L--L-L--L-L--L-L--	[reply] [d/l]