Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Regexp syntax nuance question, storing $1 (code)

by deprecated (Priest)
on May 02, 2001 at 23:01 UTC ( [id://77464]=perlquestion: print w/replies, xml ) Need Help??

deprecated has asked for the wisdom of the Perl Monks concerning the following question:

sub zapwrap { my ($line, $next_line) = (@_); return undef unless $line =~ /^#(#|=)+$/; return undef unless $next_line =~ /(?:$1)+/; return 1; }
I am trying to remove line wrapping from files that contain something like this:
###################################################################### +######################## # or ... #===================================================================== +========================
(hopefully your browser will have wrapped that line). I think I have the RE for it, I just sort of thought this one up. My question is whether in the second one, I have to ?: (forget) the saved value or if the $1 gets interpolated _before_ the value of that expression gets evaluated. It seems silly in that it has to interpolate it before it can evaluate it, but I suspect something strange might happen in the internal workings of the PCRE. :)

thanks
brother dep.

--
Laziness, Impatience, Hubris, and Generosity.

Replies are listed 'Best First'.
Re: Regexp syntax nuance question, storing $1 (code)
by japhy (Canon) on May 02, 2001 at 23:17 UTC
    The regex will first get $1 and any other variables interpolated, so you don't need to store it someplace else, or do anything special to it to get Perl to recognize it.

    However, the code is slightly flawed. YAPE::Regex::Explain will tell you that /(a|b)+/ does not match as /a+|b+/, but rather, it matches like /[ab]+/, and stores the LAST character matched into $1. This is because the regex code looks like:
    1. OPEN 1 2. MATCH 'a' OR 'b' 3. CLOSE 1 4. TRY GOTO 1
    So it can match lines like "#=#=#=#=#". Icky, no? Sadly, to get the regex to work like you'd expect, you have to do something like:
    sub zapwrap { my ($this, $next) = @_; return $this =~ m{ ^ \# # '#' at the beginning of the string ( [\#=] ) # a '#' or an '=' (saved to $1) \1* # that character 0 or more times $ # end of line }x and $next =~ m{ ^ # beginning of string $1+ # that character 1 or more times $ # end of string }x; }


    japhy -- Perl and Regex Hacker
Re: Regexp syntax nuance question, storing $1 (code)
by tye (Sage) on May 02, 2001 at 23:34 UTC

    Another alternative is:

    sub zapwrap { my ($line, $next_line) = (@_); return undef unless $line =~ /^#(#+|=+)$/; my $c= quotemeta substr($1,0,1); return undef unless $next_line =~ /$c+/; return 1; }
    though I prefer the one proposed by japhy. Also note that if the repeated character has special meaning in a regular expression, then you'd need to use \Q$1\E in japhy's version.

            - tye (but my friends call me "Tye")
Re: Regexp syntax nuance question, storing $1 (code)
by cLive ;-) (Prior) on May 03, 2001 at 00:23 UTC
    timtowtdi :)

    Assuming you've chomped the lines before sending...

    sub zapwrap { return ("$_[0]$_[1]" =~ /^#(#+|=+)$/) || undef; }
    cLive ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://77464]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2024-04-16 14:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found