http://qs1969.pair.com?node_id=1233391

sbrothy has asked for the wisdom of the Perl Monks concerning the following question:

I want to a regex which is able to match [a-zA-Z0-9_- and parantheses]. I´ve had some luck with ::alpha:: but not enough. And when I add the parantheses it goes all kablooey. Ive tried escaping them and not but to no avail.

Something along the lines of:

if($line =~ /^([A-Za-z0-9_-()]+)\d+/) {
What am I doing wrong?

Discipulus added code tags

Replies are listed 'Best First'.
Re: regex question Underscores, lines and parentheses.
by hippo (Bishop) on May 06, 2019 at 12:28 UTC
    I want to a regex which is able to match a-zA-Z0-9_- and parantheses.

    SSCCE:

    use strict; use warnings; use Test::More; my @good = ( 'a', '321', '(321)', 'a(321-)_HGF' ); my @bad = ( '"@$', '~', '[{' ); my $re = qr/^[a-zA-Z0-9_()-]+$/; plan tests => @good + @bad; for my $str (@good) { like ($str, $re, "$str matched"); } for my $str (@bad) { unlike ($str, $re, "$str not matched"); }

    See also How to ask better questions using Test::More and sample data

      This seems to be such an elegant little script for dealing with character classes but also the hyphen. I replicated it and save such scripts for the rainy day when I wonder how to do exactly this. I changed the data slightly to satisfy myself that that hyphen matched.

      $ ./1.hippo_regex.pl 1..7 ok 1 - a matched ok 2 - - matched ok 3 - (321) matched ok 4 - a(321-)_HGF matched ok 5 - "@$ not matched ok 6 - ~ not matched ok 7 - [{ not matched

      Source:

      #!/usr/bin/perl -w use 5.011; use Test::More; my @good = ( 'a', '-', '(321)', 'a(321-)_HGF' ); my @bad = ( '"@$', '~', '[{' ); my $re = qr/^[a-zA-Z0-9_()-]+$/; plan tests => @good + @bad; for my $str (@good) { like ($str, $re, "$str matched"); } for my $str (@bad) { unlike ($str, $re, "$str not matched"); }
Re: regex question Underscores, lines and paratheses.
by Eily (Monsignor) on May 06, 2019 at 12:16 UTC

    _-( inside a character class ([ between square brackets ]) mean "any character between _ and (". It might have included a lot of characters you didn't want, but actually in this case, if you look at an ASCII table, you'll see that ( actually comes before _, so this is not a valid range.

    It will also easier for us to help you if you show us some sample input data.

Re: regex question Underscores, lines and paratheses.
by AnomalousMonk (Archbishop) on May 06, 2019 at 16:50 UTC
Re: regex question Underscores, lines and paratheses.
by LanX (Saint) on May 07, 2019 at 12:47 UTC
    In short:

    ] and - are special in character classes.

    As rules of thumb avoiding ambiguity°:

    If needed ...

    • use ] as first character
    • use - as last character
    update

    D:\>perl -dE0 Loading DB routines from perl5db.pl version 1.49_05 Editor support available. Enter h or 'h h' for help, or 'perldoc perldebug' for more help. DB<1> say "$_:\t", $_ =~ /[]abc-]/ for qw/] [ - a/ ]: 1 [: -: 1 a: 1 DB<2>

    °) Otherwise ] will close the class and - will denote a range.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: regex question Underscores, lines and paratheses.
by soonix (Canon) on May 06, 2019 at 11:59 UTC
    your post is badly formatted, but at first glance your regex looks halfway OK, so perhaps the problem is somewhere else.

    Do you want to match any parenthesis, bracket, brace, or do you want to match them balanced?

    In the latter case, you might be interested in Regexp::Common.
Re: regex question Underscores, lines and paratheses.
by bliako (Monsignor) on May 06, 2019 at 12:01 UTC

    try enclosing your ranges in square brackets which roughly means "one of these characters": [A-Za-z0-9_()]