in reply to old Perl regex problem

I wonder why none of the people responded to this post actually read the post carefully. They all come up with solutions that use two regexes. What's so hard to understand about:
I'm actually reading the pattern from a configuration file. Everything outside of the slashes is not changeable.

You have a misunderstanding about the meaning of [ ] inside a regular expression. [ ] is a character class, and matches exactly one character. Inside you either list the characters that are allowed to match, or the characters that aren't allowed to match. [^(H\-000)(IF\-000)] means the same as [^(H\-0)IF] and means "match a single character, the character could be anything, except a (, an H, a dash, a 0, a ), an I or an F.

If I understand your requirements, you are looking for all files that start with PH, and do not end with either "H-000" or "I-000". The following regex ought to work:

/^PH(?:.{0,4}|.*(?![HI]-000).{5})$/
It works with 5.004_02.

Abigail

Replies are listed 'Best First'.
Re: Re: old Perl regex problem
by GhodMode (Pilgrim) on Aug 02, 2002 at 14:47 UTC

         Thank you. I understand better now. I had to look up the "?:" and "?!". I'm going to play with it a little.
         I've also thought of something like /^PH.+[^HI]F?\-000/
         Some examples of the file names are

    PH0024080209401400001PH-000 PH0026072913114200001IF-000 PH0029072911352700001AF-000
         I wonder why it worked on 5.6.1? I'll have to double-check that.

    Invulnerable. Unlimited XP. Unlimited Votes. I must be...
            GhodMode
      /^PH.+[^HI]F?\-000/ will accept a file called PHfooIF-000/. After all, the .+ can match the 'fooI', the [^HF] can match the 'I', and the F? matches nothing.

      Abigail

Re: Re: old Perl regex problem
by sauoq (Abbot) on Aug 02, 2002 at 19:06 UTC
    Close, but that still matches PH-000 due to the .{0,4} clause. I think that
    /^PH(?:.{0,3}|.*(?![HI]-000).{5})$/
    does it.
    -sauoq
    "My two cents aren't worth a dime.";
    
      but that still matches PH-000
      That's because I wrote the regex before he revealed that the things could overlap. However, later he suggests that the file names are all pretty long, so PH-000 can't happen anyway.

      Abigail

Re: Re: old Perl regex problem
by GhodMode (Pilgrim) on Aug 02, 2002 at 16:17 UTC

         I pasted your regex into my script exactly and it didn't work either :(. I found one that did, though.
         First, I want to make sure I understand yours. Please tell me if the following is correct...
    /^PH(?:.{0,4}|.*(?![HI]-000).{5})$/

    1. starts with PH
    2. (?:)these parentheses aren't memory parentheses
    3. followed by 0 to 4 of any character or zero or more of any character
    4. (?!)return true if "H-000" or "I-000" would not match next
    5. followed by 5 of any character
    6. followed by the end of the line

         At first, I thought it failed because of the .{5} part. So, I changed it to {4,5} because the possibilities are "?H-000" (only 4 characters after the H) or "I?-000". That still didn't work. I was confused about why you used the (?:) and the quantifier and the or after the first wildcard dot, so I removed them. It still didn't work.

         After all that I re-started. I counted the characters between the PH and the part I wanted to check (\d{19}) and tried /^PH\d{19}[^I][^H]-000$/. It's a little easier because all of these files have a fixed format. That did work, but I wasn't sure I needed to quantify the characters between the PH and the end of the file name. So, I ended up with /^PH.*[^I][^H]-000$/ which works fine.

         I'm not sure why I couldn't get yours to work. I haven't used regex extensions or assertions before, so I want to understand them better.

    Many thanks for your input.

    Invulnerable. Unlimited XP. Unlimited Votes. I must be...
            GhodMode
      That will fail on file name ending in "IQ-000", as you demand that the sixth character from the end isn't equal to an I.

      You seem to be changing your requirements over time. This makes it hard to be helpful. As things stand now, I suggest:

      /^PH.*(?:[^I][^H]|I[^HF])-000$/
      Abigail