Help with Regex

rsiedl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Help with Regex by davido (Cardinal) on Feb 09, 2007 at 06:20 UTC
Several issues here: First, you need a non-greedy quantifier. And while we're at it, I'm sure you want something captured, so lets use the + quantifier instead of * (which permits nothing). Next, you must also realize that your text has "newline" characters embedded within it. The . (dot) metacharacter excludes newlines by default, unless you use the /s modifier on your regexp. Here's a "repaired" version: `my $source = << 'END'; adf <!-- InstanceBeginEditable name="guts" --> this is the part we want <!-- InstanceEndEditable --> adf <!-- InstanceBeginEditable name="crap" --> adf <!-- InstanceEndEditable --> adf END # Pull out just what we need my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.?)<!-- InstanceEndEditable -->/s;` [download] You're also not checking whether your regexp successfully matched or not, so when it silently fails, you simply get no text captured (in this example). You really ought to test whether a match took place or not. One other word of warning: parsing HTML with a regular expression is fragile. You know, it's possible that a newline gets embedded in one of your HTML comments too, and that would break the regexp test. It's always advisable to use a HTML parser rather than rolling your own regexp approach. Dave	[reply] [d/l]
Re: Help with Regex by siva kumar (Pilgrim) on Feb 09, 2007 at 07:38 UTC
Try this 1. Use ungreedy (.?) 2. Treat the whole string as one line. use /s `my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.?)<!-- InstanceEndEditable -->/s;` [download] instead of `my ($new_source) = $source =~ /<!-- InstanceBeginEditable name="guts" +-->(.*)<!-- InstanceEndEditable -->/;` [download]	[reply] [d/l] [select]