exodist has asked for the wisdom of the Perl Monks concerning the following question:

The short question: I need to use variables in a regex, these variables contain filenames, these filenames contain every character under the sun including regex special character. I need to knwo how to use a variable within a regex when it might have characters that screw with the regex.

here is the code sample for where my program messes up and an example of an error:
$Title =~ s,^$Artist\s*-\s*,,ig; //Remove artist name from string
/stuff/Audio/Techno/Trance[]Control Unmatched [ in regex; marked by <-- HERE in m/^trance[ <-- HERE ]contr +ol\s*-\s*/ at ./builddb.pl line 121.
--In this case the artist name is: Trance[]Control

I have tried writing my own clean functions that escape characters but Ia lways miss something and find out long after that I messed up once it is too late to do anything about it.

The situation description: Basically I am running scripts that I am writing as I need them, these script are fixing some major problems in a set of over 30,000 files and it deals with both file name info and meta info to try and make sence of it all. Using perl I have made TONS of progress. However these files can have any character including (,),[,]
-Exodist

Replies are listed 'Best First'.
Re: Make a variable safe for a regex.
by jettero (Monsignor) on Feb 25, 2007 at 21:56 UTC

    You want \Q and \E as indicated on perlre. They're perhaps a little obscure and arcane, but powerful and awesome.

    $Title =~ s,^\Q$Artist\E\s*-\s*,,ig;

    -Paul

      \Q and \E worked, thanks a lot!
      Is there a way to do this for UTF-8 strings too? \Q will break multi-byte characters.

        Can you supply a test case for that? I've tried (shortly) and it seems to work for me:

        C:\>perl -le "my $str = qq(foo\x{100}bar); my $re = qr/^\Q$str\E/; war +n $str; warn $re; warn $str =~ /$re/;" foo&#9472;Çbar at -e line 1. (?-xism:^foo&#9472;Çbar) at -e line 1. 1 at -e line 1.

        Maybe it fails for some other cases of unicode chars, and if so I'd consider that a bug.

Re: Make a variable safe for a regex.
by kyle (Abbot) on Feb 25, 2007 at 21:56 UTC

    Quote everything that could be a metacharacter before using the variable: s/(\W)/\\$1/g

    There's also quotemeta, but I've been using the above pattern for years.

      Well, they appear to do the same, at least in the one-byte character range:
      my $str = join '', map chr, 0 .. 255; my $qm = quotemeta( $str); ( my $kyle = $str) =~ s/(\W)/\\$1/g; my $quoted_qm = join '', $qm =~ /\\(.)/g; my $quoted_kyle = join '', $kyle =~ /\\(.)/g; printf "quotemeta: %d, kyle: %d (%s)\n", length $quoted_qm, length $quoted_kyle, $quoted_qm eq $quoted_kyle ? "same" : "differernt";
      That prints
      quotemeta: 192, kyle: 192 (same)
      Anno

        \Q and quotemeta() are the same function.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Make a variable safe for a regex.
by jesuashok (Curate) on Feb 26, 2007 at 03:34 UTC
    #!/usr/bin/perl use strict; my $Artist = 'Trance[]Control'; while ( <DATA> ) { chomp; my $line = $_; $line =~ s,^\b\Q$Artist\E\b\s*-\s*,,ig; print ":$line:\n"; } __DATA__ Trance[]Control - jesuashok
Re: Make a variable safe for a regex.
by meenugaur (Initiate) on Feb 26, 2007 at 21:08 UTC
    Try using the function quotemeta on the variables to escape special characters.