I've been working on a solution to the evil problem of $& making all your regexes do more work. So I spent the last two days coming up with the following pragma (I wrote a pragma!) that offers control of this dastardly variable. Hopefully, it will make it into Perl 5.8. You can't just download this module, though -- I had to alter a couple files in the source to jive with what this pragma does.

NAME

re::ampersand - Perl pragma to alter $& support in regular expressions


SYNOPSIS

    "Perl" =~ /../ and print "<$&>";  # <Pe>
    "Perl" =~ /er/ and print "<$&>";  # <er>
    {
      # disable $& support
      no re::ampersand;
      "Perl" =~ /../ and print "<$&>";  # <>
      "Perl" =~ /er/ and print "<$&>";  # <>
    }
    {
      # disable $& support for simple regexes
      no re::ampersand 'simple';
      "Perl" =~ /../ and print "<$&>";  # <Pe>
      "Perl" =~ /er/ and print "<$&>";  # <>
    }
    {
      # disable $& support for complex regexes
      no re::ampersand 'complex';
      "Perl" =~ /../ and print "<$&>";  # <>
      "Perl" =~ /er/ and print "<$&>";  # <er>
    }


DESCRIPTION

When Perl sees you using $`, $&, or $', it has to prepare these variable after every successful pattern match. This can slow a program down because these variables are "prepared" by copying the string you matched against to an internal location. This copying is also how $DIGIT variables are made accessible, but that only occurs on a per-regex basis: if a regex has capturing parentheses, the string will be copied, otherwise it will not be.

Simple vs. Complex

Some regexes are simple enough to be matched via the Boyer-Moore substring matching algorithm. This is a fast approach at finding a substring in a string. Regexes that only rely on constant text and anchors can be matched via the Boyer-Moore algorithm. (These regexes cannot have capturing parentheses.) Because of this, they don't get solved through the standard regex engine, and end up not preparing $& and its friends -- there is no copying of the string that was matched.

However, if Perl has seen you using $&, it decides that the simple regex has to go through the engine so it can prepare $&. This means that there is a two-fold slow-down: first, the simple regex has to go through both the Boyer-Moore algorithm and the rest of the regex engine, and second, it has to copy the string that was being matched against.

Ignoring $&

The re::ampersand pragma allows you to ignore the fact that $& (or its friends) has been used in your program. This produces a speed-up in portions of your code that do not need support for $&. This pragma is lexically scoped, which means it works in the block you call it in.

Capturing still works

This module does not turn off capturing support -- if a regex has capturing parentheses in it, you will inadvertently get support for $&, because it is based on the copied string that $1, $2, ... are based on.


USAGE

Not using this pragma

Your program will run the same way it did before if you do not use this pragma. Default behavior has not been changed.

Turning off $& support

You can turn off support for $& and friends with no re::ampersand, which turns off support for all regexes (unless they have capturing parentheses). If you only want to turn off support for simple regexes, send it the argument 'simple'. If you only want to turn off support for complex regexes, send it the argument 'complex'.

Turning on $& support

Turn on support for $& with use re::ampersand which turns on support for all regexes. To only supply support to simple regexes, send it the argument 'simple'. To only supply support to complex regexes, send it the argument 'complex'. Again, any regex with capturing parentheses will always have support for $& because of the mechanism that provides $DIGIT variables.


EXAMPLES

Support for $& in a block of a program

  #!/usr/bin/perl -w
  no re::ampersand;
  # simple regex is not weighed down by $&
  "Perl" =~ /..$/ and print "<$&>\n";  # <>
  {
    use re::ampersand;
    "Perl" =~ /^../ and print "<$&>\n";  # <Pe>
  }
  "Perl" =~ /..(?=.$)/ and print "<$&>\n";  # <>

Turning off support for $& in a block

  #!/usr/bin/perl -w
  # regexes set $&
  "Perl" =~ /(?<=.)./ and print "<$&>\n";  # <e>
  {
    no re::ampersand;
    # matching on a string you'd rather not have copied!
    $huge_string =~ /a+bc+/ and print "<$&>\n";  # <>
  }
  # regexes set $&
  "Perl" =~ /.(?!..)/ and print "<$&>\n";  # <r>


AUTHOR

Jeff japhy Pinyan, japhy@pobox.com.

_____________________________________________________
Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;


In reply to Finally, a $& compromise! by japhy

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.