Regexp syntax nuance question, storing $1 (code)

deprecated has asked for the wisdom of the Perl Monks concerning the following question:

sub zapwrap {
  my ($line, $next_line) = (@_);
  return undef unless $line =~ /^#(#|=)+$/;
  return undef unless $next_line =~ /(?:$1)+/;
  return 1;
}
[download]

I am trying to remove line wrapping from files that contain something like this:

######################################################################
+########################

# or ...

#=====================================================================
+========================
[download]

(hopefully your browser will have wrapped that line). I think I have the RE for it, I just sort of thought this one up. My question is whether in the second one, I have to ?: (forget) the saved value or if the $1 gets interpolated _before_ the value of that expression gets evaluated. It seems silly in that it has to interpolate it before it can evaluate it, but I suspect something strange might happen in the internal workings of the PCRE. :)

thanks
brother dep.

--
Laziness, Impatience, Hubris, and Generosity.

Comment on Regexp syntax nuance question, storing $1 (code) Select or Download Code

Replies are listed 'Best First'.
Re: Regexp syntax nuance question, storing $1 (code) by japhy (Canon) on May 02, 2001 at 23:17 UTC
The regex will first get `$1` and any other variables interpolated, so you don't need to store it someplace else, or do anything special to it to get Perl to recognize it. However, the code is slightly flawed. YAPE::Regex::Explain will tell you that `/(a\|b)+/` does not match as `/a+\|b+/`, but rather, it matches like `/[ab]+/`, and stores the LAST character matched into `$1`. This is because the regex code looks like: `1. OPEN 1 2. MATCH 'a' OR 'b' 3. CLOSE 1 4. TRY GOTO 1` [download] So it can match lines like "#=#=#=#=#". Icky, no? Sadly, to get the regex to work like you'd expect, you have to do something like: `sub zapwrap { my ($this, $next) = @_; return $this =~ m{ ^ \# # '#' at the beginning of the string ( [\#=] ) # a '#' or an '=' (saved to $1) \1* # that character 0 or more times $ # end of line }x and $next =~ m{ ^ # beginning of string $1+ # that character 1 or more times $ # end of string }x; }` [download] `japhy` -- Perl and Regex Hacker	[reply] [d/l] [select]
Re: Regexp syntax nuance question, storing $1 (code) by tye (Sage) on May 02, 2001 at 23:34 UTC
Another alternative is: `sub zapwrap { my ($line, $next_line) = (@_); return undef unless $line =~ /^#(#+\|=+)$/; my $c= quotemeta substr($1,0,1); return undef unless $next_line =~ /$c+/; return 1; }` [download] though I prefer the one proposed by japhy. Also note that if the repeated character has special meaning in a regular expression, then you'd need to use \Q$1\E in japhy's version. - tye (but my friends call me "Tye")	[reply] [d/l]
Re: Regexp syntax nuance question, storing $1 (code) by cLive ;-) (Prior) on May 03, 2001 at 00:23 UTC
timtowtdi :) Assuming you've chomped the lines before sending... `sub zapwrap { return ("$_[0]$_[1]" =~ /^#(#+\|=+)$/) \|\| undef; }` [download] cLive ;-)	[reply] [d/l]


Perl: the Markov chain saw
	PerlMonks