http://qs1969.pair.com?node_id=376399

MistaMuShu has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
Being completely new to Perl and regexps, I thought the best way to learn is to get some practice. I thought I was getting the hang of it with simpler searches so I tried to use it for a problem at work. Here's where I got stuck:
There are a series of folders and each folder is named like so: "6 digit number" "optional ." "optional 2 digit number" "optional letter"

i.e. ######, ######.##, or ######.##a

Inside each folder is a file with the same name as the folder plus "_vml_1.htm". i.e. 274813.99a_vml_1.htm and inside this file I want to find <foldername>_vml_1.emz and replace it with <foldername>_gif_1.gif

Here's a little test program
... $folder = shift(@dir); #dir is a listing of folders $folder =~ /(\d{6}.?\d{2}?\w?)/; open FILE, ">./$folder/$1_vml.htm"; ...
I realize I must be doing something really stupid when I try to match the foldername. But I can't figure out why the expression after \d{6} does not work.
Please help shed some light on this so I don't have nightmares ;-)
Thanks in advance!
Jerry

Replies are listed 'Best First'.
Re: Regular Expression Question
by Roy Johnson (Monsignor) on Jul 21, 2004 at 23:41 UTC
    Note that a question mark after a quantifier ({2}?) does not mean optional, but rather non-greedy. You need to put parens around the sub-expression to make it optional:
    /^(\d{6}\.?(?:\d{2})?\w?$/;
    If the optionals aren't independent, you may need to nest them:
    /^(\d{6} #leading digits (?:\.? #if dot, then (?:(?:\d{2})? #if two digits, then \w?)))$/x; #maybe a character
    Note the use of anchors, to prevent matching part of the name.

    We're not really tightening our belts, it just feels that way because we're getting fatter.
      Sorry, but I think when you say "use of anchors", it means the ^ and $ sign correct?

      Yeah, I guess I oversimplified the expression I wanted to search for, but this has been most helpful. Thanks.

Re: Regular Expression Question
by swkronenfeld (Hermit) on Jul 21, 2004 at 22:58 UTC
    The . (dot) in your regular expression will match anything. You need to escape it, i.e.

    $folder =~ /(\d{6}\.?\d{2}?\w?)/;


    Also, this regular expression is matching for files which are of the form ######## (8 numbers without a dot), and files of the form ######a (6 numbers and the optional letter), as well as a few other forms that it sounds like you don't awnt to capture. Is the optional 2 digit number and letter dependent on whether there is a dot? If so, you are going to need a slightly more complex regular expression.

    One last question, why do you even have the regular expression in there? Can't you use $folder instead of $1 without loss of generality? You aren't doing a conditional based on whether or not the regexp matches, so I assume all folders are meant to be replaced in...
      Can't you use $folder instead of $1 without loss of generality?

      Well, initially I was thinking that since the actual folder names are ######.##a_files (think html folders) that I would find the first part of the name and save it in $1

      Now that I look at it again with your replies, I see that it'd be a lot easier if I just negated "_files" and did /(^_files)/

      That was silly of me to not escape the dot :( And I'll have to read a bit more on the lookahead find because ?: still does not make perfect sense to me... Thanks!