jack_64 has asked for the wisdom of the Perl Monks concerning the following question:

The following snippet will extract everything between "start" and "end" in a line

line :- start checking script end
if ($_ =~ /start([\s\S]+?)end/i)

But if i want to extract from the following format what will be the regex

start checking script end

thanks monks

Replies are listed 'Best First'.
Re: Regex Help
by morgon (Priest) on Jun 12, 2012 at 16:34 UTC
    First of all your regex is a bit strange: [\s\S] means either whitespace or non-whitespace (so in effect anything) and so you might as well simply use a ".".

    If you want to match agains a string that contains multiple lines use the "m" and the "s" switch ("m" to use multi-line stings and "s" to make newline match "." - see perldoc perlre)

    my $s = <<__end_of_string__; start checking script end __end_of_string__ my ($match) = $s =~ /start(.*?)end/ims; print $match;
    Note that this will also include the newline after "start" in your match (just as your example included the whitespace after "start").

    If you don't want that you could do it like that:

    my ($match) = $s =~ /start\n(.*?)end/ims;
Re: Regex Help
by toolic (Bishop) on Jun 12, 2012 at 16:08 UTC
    One way is to use Range Operators:
    use warnings; use strict; while (<DATA>) { print if (/start/i .. /end/i) } __DATA__ foo start checking script end foo

    And to exclude the start/end:

    if (/start/i .. /end/i) { print unless /start/i or /end/i; }
Re: Regex Help
by kennethk (Abbot) on Jun 12, 2012 at 16:41 UTC
    This is definitely getting into TIMTOWTDI territory. If you suck in the entire input using a slurp (see $/), then your existing regular expression would work.

    Rather than using the character class [\s\S], I think intent would be more obvious using . combined with the s modifier: if ($_ =~ /start(.+?)end/si)

    I would probably also add word boundaries around your start and end, so you don't get false positives from words like 'send': if ($_ =~ /\bstart\b(.+?)\bend\b/si) Even better, if you know that start and end are on lines by themselves, use the m modifier to say as much: if ($_ =~ /^start\s*\n(.+?)^end\s*$/smi) Regular expressions play best when they are tightly constrained.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: Regex Help
by AnomalousMonk (Archbishop) on Jun 12, 2012 at 21:57 UTC