rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

Hey Monks.

Can anybody help me with this regex?
<code> #!/usr/bin/perl use strict

Replies are listed 'Best First'.
Re: Help with Regex
by davido (Cardinal) on Feb 09, 2007 at 06:20 UTC

    Several issues here: First, you need a non-greedy quantifier. And while we're at it, I'm sure you want something captured, so lets use the + quantifier instead of * (which permits nothing).

    Next, you must also realize that your text has "newline" characters embedded within it. The . (dot) metacharacter excludes newlines by default, unless you use the /s modifier on your regexp. Here's a "repaired" version:

    my $source = << 'END'; adf <!-- InstanceBeginEditable name="guts" --> this is the part we want <!-- InstanceEndEditable --> adf <!-- InstanceBeginEditable name="crap" --> adf <!-- InstanceEndEditable --> adf END # Pull out just what we need my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.?)<!-- InstanceEndEditable -->/s;

    You're also not checking whether your regexp successfully matched or not, so when it silently fails, you simply get no text captured (in this example). You really ought to test whether a match took place or not.

    One other word of warning: parsing HTML with a regular expression is fragile. You know, it's possible that a newline gets embedded in one of your HTML comments too, and that would break the regexp test. It's always advisable to use a HTML parser rather than rolling your own regexp approach.


    Dave

Re: Help with Regex
by siva kumar (Pilgrim) on Feb 09, 2007 at 07:38 UTC
    Try this
    1. Use ungreedy (.*?)
    2. Treat the whole string as one line. use /s
    my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.*?)<!-- InstanceEndEditable -->/s;
    instead of
    my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.*)<!-- InstanceEndEditable -->/;