Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Sorry for the beginner's question, but I'm a beginner :)

I'm trying to match filenames (read out of a directory by readdir). I want to match anything starting with test, having one more more digits after it, zero or more characters, but not ending in a ~.

What I have so far is:

if ( /^test\d+/.*/ ) {print "$_\n";)
Which covers everything except forcing it not to match if there is a ~ at the end.

I seek enlightenment.

20040501 Edit by jdporter: Changed title from 'regex question'

Replies are listed 'Best First'.
Re: matching file names using regex
by eric256 (Parson) on Apr 29, 2004 at 18:15 UTC

    How about haveing 2 checks?

    if (/^test\d+.*/ && !/~$/)

    ___________
    Eric Hodges
      The benefit for two checks is that each rule can be tuned separately. You might realize there are a number of weird endings that you don't like, such as m{ [#@$.~] $ }x.

      The negative aspect of using two regex to check legality is that it's a bit harder to supply a rule in a configuration file, such as those used by spam filters, etc. If your code can't handle two supplied regex, then you'll have to figure out one.

      A negative look-behind is useful.

      print if m{ ^ # begins with test # 'test' \d+ # a number .*? # nothing or anything else (?<! ~ ) # but no '~' at the tail $ # to the end }ix;

      --
      [ e d @ h a l l e y . c c ]

Re: matching file names using regex
by Enlil (Parson) on Apr 29, 2004 at 18:42 UTC
    Another way:
    /^(?!.*~$)test\d+/

    A couple notes about a few of the previously presented regexes. In this node {1,} is functionally equivalent to just + as both mean 1 or more times. Not that it is wrong just equivalent (as the OP is a beginner).

    In Re: matching file names using regex one of the tests (/^test\d+.*/) has a .* at the end which unless you are meaning to capture something with it really serves no purpose as .* means match as much of anything as you can (as much can be as little as 0, which is guaranteed to match, so it is unnecessary). Again not necessarily wrong. To be honest I would probably choose the two test solution as it is more clear at to what one is doing.

    -enlil

      Your solution is good and it may even be efficient thanks to anchoring, but stylistically, I guess I just don't like $ or \z anywhere but the end of the regex. That's just a matter of taste, as your other comments also illustrate.

      --
      [ e d @ h a l l e y . c c ]

Re: matching file names using regex
by pizza_milkshake (Monk) on Apr 29, 2004 at 17:59 UTC
    /^test\d{1,}(.*[^~])?$/
    the hard part is that, based on your description, the pattern should match "test1" and not match "test1~". so a plain [^~] at the end doesn't work, because the requires another character after the 1 digit, which should be optional. so the whole thing is optional.

    question: how does /^test\d+/.*/ work? i've seen regexes matches with 3 delims before but never quite got it

    perl -e'$_="nwdd\x7F^n\x7Flm{{llql0}qs\x14";s/./chr(ord$&^30)/ge;print'

Re: matching file names using regex
by Abigail-II (Bishop) on Apr 29, 2004 at 21:09 UTC
Re: matching file names using regex
by Eimi Metamorphoumai (Deacon) on Apr 29, 2004 at 20:44 UTC
    I agree with the others that it's cleanest to do it with two separate checks,
    if (/^test\d/ && !/~$/){ ...
    (Note that the + and the .* don't add anything. Anything that matches the above will also match your regexp, and vice versa (except for your broken / in the middle)). There is a way to do it all in one regexp without lookaheads or lookbehinds, though.
    if (/^test\d(?:$|.*[^~]$)/){
    Which says that after your first digit, there are either no characters at all, or as many as you like provided the last one isn't a ~.
Re: matching file names using regex
by hv (Prior) on Apr 30, 2004 at 02:03 UTC

    Other responses have explained how you can use lookbehind to satisfy the check for "not ending in a ~". A more minor point, but worth thinking about, is the fact that since the "one or more digits" can be followed by arbitrary characters, you can replace "one or more digits" with "one digit" in the spec and still retain a functionally equivalent set of constraints::

    /^test\d.*(?<!~)\z/

    This insight is implicit in a couple of the responses to date, but was not mentioned explicitly in either of them.

    In terms of efficiency (which may not be greatly relevant to your particular problem), the main potential problem is that when the filename does end in '~' the regexp engine will do a lot of needless backtracking trying to find some other way to match. That can be avoided with a cut operator:

    /^test\d(?>.*)(?<!~)\z/
    but is only likely to gain anything if the filenames can be of arbitrary length: on any O/S that limits the length of filenames to something reasonably small it is unlikely to be noticeable.

    Hugo

Re: matching file names using regex
by Wassercrats (Initiate) on Apr 30, 2004 at 01:29 UTC
    I want to match anything starting with test, having one more more digits after it, zero or more characters, but not ending in a ~.

    I guess you're saying that there should be one or more digits immediately following "test", which is what you have in your example. Abigail-II's /^test(?=\d).*[^~]$/ would do that, but it could be shortened slightly to /^test\d+.*[^~]$/

    If there needs to be one or more digits anywhere after "test" then use /^test(?=.*\d).*[^~]$/