Re: Multi-Line Regex's

/m changes the effect of ^ and $ to match at the start and end of lines (rather than the whole string). You need /s which changes the meaning of . so it matches \n.

I remember it as /s changes the meaning of a single metacharacter and /m changes the meaning of multiple metacharacters.

#!/usr/bin/perl -w

use strict;

$_ =
'<P>
THE GENERAL SYNOPSIS AT 0100<BR>
LOW SOUTH FITZROY 1000 MOVING SLOWLY NORTH AND FILLING 1006 BY 0100<BR
+>
TOMORROW. NEW LOW EXPECTED 50 MILES WEST OF TRAFALGAR 1007 BY SAME<BR>
TIME. HIGH 100 MILES WEST OF ROCKALL 1023 SLOW MOVING AND DECLINING<BR
+>
1021 BY THAT TIME<BR>
<P>
THE AREA FORECASTS FOR THE NEXT 24 HOURS<BR>';

/GENERAL SYNOPSIS AT (\d{4})<BR>\s+(.*)\s<P>/s;

print "1 -> $1\n2 -> $2\n";
[download]

Of course the usual caveats about not parsing HTML with regexes still apply :)

--
<http://www.dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

Comment on Re: Multi-Line Regex's Download Code

Replies are listed 'Best First'.
Re: Re: Multi-Line Regex's by sch (Pilgrim) on Sep 18, 2002 at 13:54 UTC
Of course the usual caveats about not parsing HTML with regexes still apply :) While I can see in general that handling big chunks of html is preferably done with things like HTML::Parser, in this simple case where I'm trying to grab one paragraph which is easily delimited from a specific webpage is there any real advantage to those tools?	[reply]
Re: Re: Re: Multi-Line Regex's by davorg (Chancellor) on Sep 18, 2002 at 14:01 UTC
Well, only the fact that HTML parsers will actually parse the HTML for you - whereas any regex-based solution will only handle a subset of the possible HTML and will prove extremely fragile if the HTML ever changes. You might like to take a look at the section "How not to parse HTML" in chapter 8 of Data Munging with Perl. -- <http://www.dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply]