igoryonya has asked for the wisdom of the Perl Monks concerning the following question:

Hello, is there a way to define your own regex character class, such as [[:digit:]].
So, I would, for example, define a regex:

/\b(?:[[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}\b/

and assign it to a:

[[:ipv4:]]

Is it possible to do something like that?

Replies are listed 'Best First'.
Re: Defining your own regex character class
by haukex (Archbishop) on Dec 18, 2017 at 08:26 UTC

    There is an experimental feature (that I haven't used yet), Extended Bracketed Character Classes, that should allow you to define your own classes (or more specifically, compose a character class out of others, which you can also interpolate into a regex).

    However, /\b(?:[[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}\b/ is not a character class (emphasis mine):

    A character class is a way of denoting a set of characters in such a way that one character of the set is matched. It's important to remember that: matching a character class consumes exactly one character in the source string.

    What you've got is a whole regex, and I think that LanX is right that simply compiling that into a variable with qr and interpolating that into another regex is the best way to go. Later in that thread you asked if you could isolate the variable, and the usual ${} can be applied here too, as well as the /x modifier:

    my $x = qr/foo/; my $y = qr/${x}bar/; # (?^:(?^:foo)bar) my $z = qr/ $x bar /x; # (?^x: (?^:foo) bar )

    Regarding the example you've shown, note there is $RE{net}{IPv4} from Regexp::Common::net.

    Minor edits for clarification.

Re: Defining your own regex character class
by AnomalousMonk (Archbishop) on Dec 18, 2017 at 17:38 UTC

    Further to LanX's reply and haukex's reply: Note that in addition to being used to compose more complex regexes, a  qr// object can be quantified as discussed here (with one small exception discussed below) in the same way as other regex atoms.

    The quantifier exception is for the case of a counting quantifier on a regex object that looks "too much" like a hash element. The problem is rare (albeit potentially completely silent if it is present!) and easily fixed:

    c:\@Work\Perl\monks>perl -wMstrict -le "my %rx = ( 2 => 'Oops...' ); my $rx = qr{ \b foo \b }xms; ;; my $n = 2; my $ry = qr{ $rx{2} X $rx{$n} Y (?:$rx){$n} }xms; print $ry; " (?msx-i: (?msx-i: \b foo \b ){2} X Oops... Y (?:(?msx-i: \b foo \b )){ +2} )
    (Update: Changed this code example to make it shorter, hopefully clearer.)

    (BTW: Note also that  $RE{net}{IPv4} from Regexp::Common::net is by design not delimited, so there can be a match in certain undesired or surprising cases:

    c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common qw(net); ;; my $ipv4_A = qr{ $RE{net}{IPv4} }xms; my $ipv4_B = qr{ \b $RE{net}{IPv4} \b }xms; ;; print 'match A' if '99999.9.9.99999' =~ $ipv4_A; print 'match B' if '99999.9.9.99999' =~ $ipv4_B; " match A
    Caveat Programmor. :)

    Update: Here's a fun (for some definition of "fun") little problem. A decimal (i.e., base-10) IPv4 address regex could be neatly defined as follows:

    my $octet = qr{ \d+ }xms; my $ipv4 = qr{ \b $octet (?: [.] $octet){3} \b }xms;
    Unfortunately, this matches an IP address with octets like 256 or 99999. How would you define  $octet as a pure (i.e., no  (?{ code }) or  (?{{ code }}) constructs) regex so that only decimal octets in the range 0 .. 255 were matched? (Please, no experienced regex wranglers need reply!)


    Give a man a fish:  <%-{-{-{-<

      I was going to suggest better definitions of $octet (one using pure regexes and one using a code assertion), but I'll refrain from that after having read your last paragraph. ;-)

      And, BTW, to the OP: what you're looking for is not called a character class (but this has been pointed out already).

        ... definitions of $octet ... your last paragraph.

        Yeah, I had that in mind as something for a regex novice to play around with, especially in light of the topic of the OP (partial hint, hint).


        Give a man a fish:  <%-{-{-{-<

Re: Defining your own regex character class
by LanX (Saint) on Dec 18, 2017 at 05:47 UTC
      I guess, nothing is wrong, I was just thinking of a way for isolating a variable name from the rest of the regex, i.e. variable name is not terminated.
      Is this possible?: [[:$ipv4:]] or I should do (?:$ipv4)?
Re: Defining your own regex character class
by Anonymous Monk on Dec 18, 2017 at 14:19 UTC

    I don't know about custom POSIX syntax, but you can define your own Unicode properties. See perlunicode for the gory details.

    If your Perl is at least 5.10, you can define your own matches of any sort (not just character classes) using the (?(DEFINE)...) construction:

    $x =~ m/( (?&DIGIT)+ ) (?(DEFINE) (?<DIGIT>:[0-9]) ) /smx

    Note that the above example is the coldest of cold code. This is documented in perlre. That document also has a section on creating custom RE engines, but the second sentence of the section is "This is not for the faint of heart."