pekkhum has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to create a sub that will take a filename (e.g. page-gen.pl) and make it fit in a regex. Backslashing characters is easy, but I can't find a way to get rid of the natural caps-sensativity of regex. What I have so far should show the method I am trying:
sub Regch{ $_[0]=~s/(\.|\-|\(|\)|\[|\]|\{|\}|\?|\/|\\|\^|\*)/\\$1/g; $_[0]=~s/([a-zA-Z])/[$1]/g; return $_[0]; }
It would be nice to have suggestions on how to do this or any PERL modules that would do better. Thanks.

Replies are listed 'Best First'.
Re: Capitalization and Regex
by Abigail-II (Bishop) on Oct 26, 2003 at 22:50 UTC
    $_[0]=~s/(\.|\-|\(|\)|\[|\]|\{|\}|\?|\/|\\|\^|\*)/\\$1/g;
    Yikes! So many backslashes, so many |, when you don't need them:
    $_ [0] =~ s!([][.(){}?/\\^*-])!\\$1!g;

    Abigail

      ::Prints the thread:: S many things to look up! O_O This has been most profitable.
      Okay, based on what was here I tried to find a better way to assemble my data, here is an example of the way it would look:
      #!perl -w @res='index.cgi'; @ext=qw(css tmp class edf); @regex=''; foreach(@ext){ $reg=$reg."|^.*\\.$_\$"; } foreach(@res){ $reg=$reg."|^\Q$_\E\$"; } if($regex){ $reg="|$regex$reg"; } $thing=~/^\.+$$reg/i;
      I've cut out the source of the actual data, and inserted sample info for simplicity. How is my improvement?
        Um, it's still not clear what you're actually trying to accomplish. There are a handful of odd properties in this snippet of code, including some things that look like mistakes or misconceptions. As many others here would tell you, things would actually go better for you with "use strict;"

        First, I would assume that your declarations at the top were intended to look like this:

        @res = qw/index.cgi/; # could be more than one file... @ext = qw/css tmp class edf/; $regex = ''; # note: "$", not "@"
        Next, since you don't initialize "$reg" to anything prior to the first "for" loop, it comes out starting with "|", which is "grammatical", but probably won't do what you want -- try this:
        perl -e '$_ = "hi"; print "ok\n" if ( /|x/ );
        It will always print "ok", no matter what string is assigned to $_, because a regex that starts with "|" will always match anything (even an empty string). (update: I realize that $reg gets appended to some other literal when it's finally used for matching something; having "|" at the start will still not do what you probably expect/want it to do.)

        BTW, the more legible idiom for concatenating values onto a string inside a loop is:

        $string = "initial value"; for ( @addons ) { $string .= " $_"; # or whatever }
        Next, it's not clear why you have the "if ($regex)" condition, since this variable is apparently empty (false) when you reach this point.

        Finally, the last line is a complete mystery. What does "$thing" contain, and what are you really hoping to match? It looks like you are now trying to use $reg as a reference to a scalar (because of the two dollar-signs); again, this is "grammatical", but $reg is not a reference to a scalar, so you end up with an empty string at that point in the regex.

        Take a few steps back from the minute details, and try to approach it from "the big picture". What is the situation that you are starting with, and what do you want it to be when your code does its job properly?

        If you're just trying to build a regex that will match a given set of file name extensions (e.g. qw/css tmp class edf/), all you need to do is:

        my $ext_regex = '\.(' . join('|', qw/css tmp class edf/) . ')$'; for ( @filenames ) { print "this one matches: $_\n" if ( /$ext_regex/i ); }
Re: Capitalization and Regex
by etcshadow (Priest) on Oct 26, 2003 at 22:26 UTC
    Indirect answer to your question: don't do this. You don't need to make a sub to translate a name into something that you can put in a regexp. Just put it in the regexp with \Q and \E (see quotemeta in perfunc docs). Also, as several folks stated, use the /i modifier on your regexp.

    So... do not try to do:

    my $munged_name = Regch($name); $thing =~ /$munged_name/;
    Rather... try this:
    $thing =~ /\Q$name\E/i;

    How much easier is that? Also, you learned an important thing about perl. :-D

    -----------------------------------

    However, if you want a direct and exact answer to your question, you could do this:

    sub Regch { return "(?i:\Q$_[0]\E)"; }
    Read the docs on perlre to learn more about why.

    ------------
    :Wq
    Not an editor command: Wq
Re: Capitalization and Regex
by davido (Cardinal) on Oct 26, 2003 at 22:06 UTC
    I'm a little fuzzy on what you're asking, but it sounds like you want to match case-insensitively, or perhaps, convert from mixed-case to all one-case.

    I think you might want to look at the /i regular expression modifier, explained in perlretut and perlre, or the uc and lc functions, explained in perlfunc.

    You can change the capitalization in portions of strings with the \l and \u metacharacters (for one char at a time) or the \L, \U, and \E metacharacters (for altering the case of chunks of characters). These are also described in detail in perlretut.

    Update: Thanks Anonymous Monk for finding a more accurate way to express my thought. I've corrected the verbage.


    Dave


    "If I had my life to live over again, I'd be a plumber." -- Albert Einstein
      If you need only part of the RE to work case-insensitively, look at the \l and \u metacharacters (for one char at a time) or the \L, \U, and \E metacharacters (for altering the case of chunks of characters). These are also described in detail in perlretut.
      I don't think those help in making portions of RE's case insensitive, they will change portions of strings to upper or lower case. For case insensitive portions of an RE, use /some(?i:InSensiTIve)portion/
      Thank you, everyone, for the help. It's back to the Man pages with me. ^_^
Re: Capitalization and Regex
by The Mad Hatter (Priest) on Oct 26, 2003 at 22:10 UTC
    Use the /i flag like you use the /g. Ex/
    sub Regch{ $_[0]=~s/(\.|\-|\(|\)|\[|\]|\{|\}|\?|\/|\\|\^|\*)/\\$1/gi; $_[0]=~s/([a-z])/[$1]/gi; return $_[0]; }
    Update Removed character class redundacy. That will teach me to at least look at the regex before posting... ; )
      $_[0]=~s/([a-zA-Z])/[$1]/gi;

      Of course with the /i modifier, [a-zA-Z] becomes redundant, and so, can be expressed as simply [a-z], assuming we're dealing with ASCII.


      Dave


      "If I had my life to live over again, I'd be a plumber." -- Albert Einstein