in reply to Multi-Line Regex's

/m changes the effect of ^ and $ to match at the start and end of lines (rather than the whole string). You need /s which changes the meaning of . so it matches \n.

I remember it as /s changes the meaning of a single metacharacter and /m changes the meaning of multiple metacharacters.

#!/usr/bin/perl -w use strict; $_ = '<P> THE GENERAL SYNOPSIS AT 0100<BR> LOW SOUTH FITZROY 1000 MOVING SLOWLY NORTH AND FILLING 1006 BY 0100<BR +> TOMORROW. NEW LOW EXPECTED 50 MILES WEST OF TRAFALGAR 1007 BY SAME<BR> TIME. HIGH 100 MILES WEST OF ROCKALL 1023 SLOW MOVING AND DECLINING<BR +> 1021 BY THAT TIME<BR> <P> THE AREA FORECASTS FOR THE NEXT 24 HOURS<BR>'; /GENERAL SYNOPSIS AT (\d{4})<BR>\s+(.*)\s<P>/s; print "1 -> $1\n2 -> $2\n";

Of course the usual caveats about not parsing HTML with regexes still apply :)

--
<http://www.dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

Replies are listed 'Best First'.
Re: Re: Multi-Line Regex's
by sch (Pilgrim) on Sep 18, 2002 at 13:54 UTC

    Of course the usual caveats about not parsing HTML with regexes still apply :)

    While I can see in general that handling big chunks of html is preferably done with things like HTML::Parser, in this simple case where I'm trying to grab one paragraph which is easily delimited from a specific webpage is there any real advantage to those tools?

      Well, only the fact that HTML parsers will actually parse the HTML for you - whereas any regex-based solution will only handle a subset of the possible HTML and will prove extremely fragile if the HTML ever changes.

      You might like to take a look at the section "How not to parse HTML" in chapter 8 of Data Munging with Perl.

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg