Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm using File::Find to get a list of files from a base directory. I then want to find files that are in certain key directories that are kept in scalar variables. But the regular expression will not work correctly as the backslashes seem to cause funnies.

For example, the following two code examples would be expected to give the same output:-

example 1.
my $match = "\\well"; my $text = "\\well don't you know it"; $text =~ s/$match//; print "$text\n"; example 2. $text = "\\well don't you know it"; $text =~ s/\\well//; print "$text\n";

You would expect to see: don't you know it

But example 1 gives: \ don't you know it

It appears as if the regex is compiled to take the \\ to mean \ but compares against the string which keeps the original \\.

This really messes things up if you want to specify a directory structure (eg c:\\windows) for dealing with files, but then want a regex of the same value; it would have to be defines as c:\\\\windows.

Does anybody know what is going on here and is there a way around it.

Humbly

Tim

Replies are listed 'Best First'.
Re: using a scalar in a regex with backslashes
by broquaint (Abbot) on Oct 09, 2002 at 16:42 UTC
    You could either use the \Q delimiter or quotemeta e.g
    my $match = "\\well "; my $text = "\\well don't you know it"; $text =~ s/\Q$match//; print "$text\n"; __output__ don't you know it

    HTH

    _________
    broquaint
    .

Re: using a scalar in a regex with backslashes
by bart (Canon) on Oct 09, 2002 at 18:50 UTC
    This problem occurs so much it should be considered a FAQ. If you want to use a variable as a regex, you must remember that it is what's in the string that matters, not what you typed to produce that string. The string contains / should contain what you would write literally in the regex. So, wat do you expect to see if your do:
    my $match = "\\well"; print $match;
    ? Well: it prints
    \well
    So doing
    my $match = "\\well"; $text =~ s/$match//;
    is, more or less (apart from the fact that the regex can be altered at runtime), equivalent to writing:
    $text =~ s/\well//;
    and /\w/ can match any letter, which is the effect you see: match a word character followed by "ell". For your string this deletes the word "well".

    Conclusion: if you want to match a literal backslash, the regex/string should contain two backslashes, so the source code to produce the string should contain 4:

    my $match = "\\\\well"; $text =~ s/$match//;
    One way around this is to use qr//:
    my $match = qr/\\well/; $text =~ s/$match//;
    In this case, what you type is what you get — including any present or missing modifiers.

    p.s. If you load the pattern(s) for a regex from a text file, you won't have this problem, as what is in the file is what will be in the pattern.

Re: using a scalar in a regex with backslashes
by BrowserUk (Patriarch) on Oct 09, 2002 at 17:11 UTC

    You can save yourself a bit of typing effort by using forward slashes (c:/path/to/file.txt) instead of doubled backslashes (c:\\path\\to\\file.txt). This works most of the time in Perl under Win32.

    The only time I've encountered problems with it is passing these strings to CMD.EXE through backticks or system and similar mechanisms. There may be others I haven't encountered.


    Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!
Re: using a scalar in a regex with backslashes
by sauoq (Abbot) on Oct 09, 2002 at 18:59 UTC
    It appears as if the regex is compiled to take the \\ to mean \ but compares against the string which keeps the original \\.

    No. The backwhack acts as an escape character in strings and the details can get confusing.

    The original text in your case only has one literal backwhack, not two. Print it to see for yourself.

    In the first example your $match variable has one literal backwhack but once that pattern is used, it looks like "\w" which matches a word character in a regex.

    In the second example, the first backwhack in the pattern escapes the second resulting in one actual backwhack. That one matches the one in your original text.

    To get around these issues, learn how escaping works in strings and patterns. Perl's quotemeta() function can help a great deal in many situations and, in patterns, the special escapes \Q and \E are useful.

    -sauoq
    "My two cents aren't worth a dime.";