in reply to A tidier regex ?

Would you consider this 'better'

$s =~ m[CDC(?:_[A-Z0-9]+){2}(?:,\s+DDC[A-Z0-9]+){2}] and print 'Matche +d';;

The question is, do you need to be quite so specific?

That is, if you reduced the regex to say: m[CDC\w+(?:,\s+\w+){2}], is there the possibility that it could falsely match something else that will appear somewhere in the file?

It's obviously not so thorough, but it may be good enough given your knowledge of what will be in the file.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^2: A tidier regex ?
by dasgar (Priest) on Sep 14, 2010 at 13:24 UTC

    BrowserUk, I was thinking along the same lines on the regex, but you beat me to it and you've got better regexes than what I was coming up with.

    Assuming that OP is concerned about the formatting (the two underscores in the first 'word' and starting strings for the next two 'words), I think a slight modification of your second regex could work to meet that need. Something like:

    m[CDC_\w+?_\w+(?:,\s+DDC(?:S|R)MR\w+){2}]

    Of course, I haven't tested that, so I wouldn't be surprised if someone was able to point out problem(s) in that regex.

      I don't think I explained the issue very well. The user will input a single word which could be either.

      CDC_*_* or

      DDC(SMR or RMR)*

      So if my limited knowledge or regex is correct the above doesn't match. I tested it . Hence my reply clarifying the issue. Sorry for any confusion caused.
Re^2: A tidier regex ?
by Anonymous Monk on Sep 14, 2010 at 13:32 UTC
    Since this is checking user input in the form of a prompt (rather than input from a file) the suggestions you have made would be sufficient. However in your suggestion whether this would cover it
    m[CDC\w+(?:,\s+\w+){2}]
    How is the DDC.... catered for ?. My regex skills are at the beginner level hence my question. Thanks.
      How is the DDC.... catered for ?.

      It's allowed for by the \w+, which matches [A-Za-z0-9_], but it is not verified. So, for example it would also match 'CDC_1, ABC, ABC', if there was any possibility of that appearing in your data.

      And that is where you will have to apply your knowledge of your data to decide just how specific you have to be to ensure you only match that data you want to match.

      You might for instance know that there will be lines similar to CDC_..., ABC..., XYZ... that you mustn't match, in which case, you need to be more specific. Maybe m[CDC\w+(?:,\s+DDC\w+){2}] would satisfy.

      But, if the data is coming from a users typing--who are apt to transpose and omit stuff--then maybe you should stick with a fully specified regex. Say

      m[CDC(?:_[A-Z0-9]+){2}(?:,\s+DDC[SR]MR[A-Z0-9]+){2}]

      Only you can know your full requirements.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.