jxia has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to remove the first '/' in all file names in a string. But my first operator only removes the first occurrence when I use '.*' in my pattern. The second one works when I use '\w*'. I just couldn't figure out what is the difference.

Following is my code:

$_ = 'A /gif/cart.gif is a /gif/cart.gif'; s#/(gif/.*\.gif)#$1#g; # Only replaced first one print "NEW1: $_\n"; $_ = 'A /gif/cart.gif is a /gif/cart.gif'; s#/(gif/\w*\.gif)#$1#g; # This one works print "NEW2: $_\n";

Replies are listed 'Best First'.
Re: pattern matching question
by hardburn (Abbot) on Jun 20, 2003 at 15:48 UTC

    You're problem is that .* is greedy (i.e., it matches everything it possibly can). See also Death to Dot Star!.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

Re: pattern matching question
by KPeter0314 (Deacon) on Jun 20, 2003 at 16:02 UTC
    The biggest thing is the difference in the definintion between . and \w.
    • . matches anything except \n
    • \w matches a word character [a-zA-Z_0-9]

    Quite a big difference in my book.

    -Kurt

    Doh! Update: the second item was misstyped as \n and is fixed to \w. Note to self, slow down and read what you type.

Re: pattern matching question
by DigitalKitty (Parson) on Jun 20, 2003 at 17:19 UTC
    Hi jxia.

    The '*' quantifier matches as many characters as it can (zero or more). The '.' implies any character except a newline. You can disable greediness by including a '?' to the end of your regex pattern. Suggestion: Don't use '#' in a pattern match or replacement. It looks like a comment and can frustrate debugging efforts.

    Hope this helps,
    -Katie.
Re: pattern matching question
by monsieur_champs (Curate) on Jun 20, 2003 at 18:38 UTC

    Dear fellow

    Maybe you would like to use s,\B/\b,,og in place of the polluted s#/(gif/\w*\.gif)#$1#g. Its more generic and will run a little faster.

    The main difference between using .* and \w* in your code is that .* allways try to "eat" as many characters it can, including spaces and everything else in your way(1). In your case, it will match "gif/cart.gif is a /gif/cart" (yes, all those).

    On the other hand, \w* can't match that much characters, and stops matching when it finds a space character (e.g.: one of the " ", "\n", "\t", "\f"). That way, it will match "gif/cart", because the "gif/cart.gif is a /gif/cart" have spaces before the "\.gif" that you asked your regexp to match before that.

    Hope that helps.

    =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
    Just Another Perl Monk

    Note #1: Actually, not exactly. But "What you don't know don't hurts you".