Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Bareword Regex

by davorg (Chancellor)
on Dec 06, 2001 at 16:57 UTC ( [id://129902]=perlquestion: print w/replies, xml ) Need Help??

davorg has asked for the wisdom of the Perl Monks concerning the following question:

This came up on the London.pm mailing list this morning. Why does the following code do what it does:

#!/usr/local/bin/perl -w use strict; my $x = "wibble mmnipm"; if ($x =~ mmnipm) { print "match\n"; } else { print "no match\n"; }

A couple of people suggested that the mmnipm is being interpreted as m/nip/, but further investigation shows it's being interpreted as m/mmnipm/.

perlop says this about the binding operator:

If the right argument is an expression rather than a search pattern, substitution, or transliteration, it is interpreted as a search pattern at run time. This can be less efficient than an explicit search, because the pattern must be compiled every time the expression is evaluated.

Which explains (almost) what's going on, but the one mystery remaining is why the code doesn't trigger a bareword error under use strict 'subs'. Actually under 5.005_02, it does give an error, but that error seems to have been removed in 5.005_03.

My other question is, why isn't mmnipm interpreted as m/nip/? I'm sure that I've read that if you're using a letter as the regex delimiter then you need to put a space before it (and testing shows that m mnipm is parsed as m/nip/) but I can't find a reference anywhere to this behaviour.

--
<http://www.dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

Replies are listed 'Best First'.
Re: Bareword Regex
by Masem (Monsignor) on Dec 06, 2001 at 17:50 UTC
    Second question: I'd suspect it's the same reason that we'd need to include spaces to distinquish between function names and their args in golf: the perl script parser is looking for groupings of /[a-zA-Z1-90_]*/. First, note that perlop says that any symbol used at the ends of the regex can have an option space between the m and the delimiter, save for '#' (for obvious reasons). So "m/test/" and "m /test/" will work the same. Second, a quick test with "m_nip_" and "m _nip_" shows that these two act very diffently, while any other symbol appears to act normally. So if what it sees could work out to a 'word' in the perl sense as given by the regex above, it's grouped into a single 'word' and used whatever way that makes sense. On the other hand, any symbol breaks that word up, and perl can now recongize the 'm' and the rest of the regex alone.

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

Re: Bareword Regex
by japhy (Canon) on Dec 06, 2001 at 20:09 UTC
    Dave, if you've ever written a function that begins with the characters "m", "q", "s", "tr", or "y", you'd understand why moo parses as 'moo' and not m//... ;)

    Honestly, though, I doubt you will find the use of alpanumberscores as delimiters documented. The closest I came was erroneous documentation in perlop:

    The lack of processing of \\ creates specific restrictions on the post-processed text. If the delimiter is /, one cannot get the combination \/ into the result of this step. / will finish the regular expression, \/ will be stripped to / on the previous step, and \\/ will be left as is. Because / is equivalent to \/ inside a regular expression, this does not matter unless the delimiter happens to be character special to the RE engine, such as in s*foo*bar*, m[foo], or ?foo?; or an alphanumeric char, as in:
    m m ^ a \s* b mmx;
    In the RE above, which is intentionally obfuscated for illustration, the delimiter is m, the modifier is mx, and after backslash-removal the RE is the same as for m/ ^ a s* b /mx). There's more than one reason you're encouraged to restrict your delimiters to non-alphanumeric, non-whitespace choices.
    The documentation is plain wrong, as far as what it's telling you, but shows the use of "m" as the delimiter.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: Bareword Regex
by virtualsue (Vicar) on Dec 06, 2001 at 18:34 UTC
    If you play with this fragment, it helps to use
    print "match, $&\n"
    to see the actual match.

    While we're on the topic, davorg pointed me to a post by chipmunk which in turn points to a piece of Perl poetry that he wrote using "m mmm" (which as we've now seen, could also have been written as "mmmm").

      Just to pick a couple of nits in the interest of increasing understanding...

      In that poem, the "m mmm" could be replaced with any bareword simply because the poem "won't run" and without strict, throwing around barewords is unlikely to cause compilation errors.

      However, if "m mmm" was used in code to actually match something (being a silly way of writing "m//m"), then you could never replace it with "mmmm".

      First, m//m is pretty silly code itself. The empty pattern means "reuse the most recent successful regex", and testing (since I so doubt it is documented that I didn't even look for it) shows that when using the empty regex this way, any options are ignored (including /g!); note that this might be considered a bug and get fixed some day, so don't rely on that behavior. So it boils down to being the same as m//, m//mg, m//g, or any other similar code.

      But if we wanted to the code to compile to the same optree, then we could replace "=~ m mmm" with "=~ mmm" (note only 3 "m"s, not 4) provided we either didn't use strict or used a version of Perl that doesn't give an error in this case. However, in the absence of "=~", replacing "m mmm" with "mmm" will change the code from m//m (which matches against $_ since there is no "=~") to 'mmm' (which is simply a string constant) unless something else gets in the way such as strict or the declaration of a subroutine named "mmm".

              - tye (but my friends call me "Tye")
      Maybe the separator could be null?
      I.E:
      "/" - the normal separator ( =~ /asdf/)
      "#" - some people like to use it instead of "/" (=~ #asdf#)
      "" - how about this separator?
(Ovid) Re: Bareword Regex
by Ovid (Cardinal) on Dec 06, 2001 at 23:49 UTC

    You may be interested in a similar discussion that took place here a while ago. It was decided that this was a bug and I submitted a bug report on it.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

(tye)Re: Bareword Regex
by tye (Sage) on Dec 06, 2001 at 20:23 UTC

    The rest having been covered, I'll opine that the disappearance of the error is simply a bug that needs to be fixed.

            - tye (but my friends call me "Tye")

      Looks like robin has just submitted a patch to p5p to fix it.

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://129902]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-03-28 19:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found