dwhite20899 has asked for the wisdom of the Perl Monks concerning the following question:

I did a SuperSearch, and didn't turn up anything directly related to my question, so...

I've been using this code and regex as the heart of an is_MD5 subroutine:

if ($hash =~ /[0-9a-f]{32}/i) { return(1); }

Now I'd like to generalize it for other hex hash strings, and I think I've negated it correctly like so:

if ($hash =~ /[^0-9a-fA-F]/ ) { return(0); }

The new code continues into some other checks as long as the entire hash string is good hex. It works with all my test cases, but I don't have a large enough set to find any performance difference. Am I doing something stupid that could bite me down the road?

Update: Kyle, thanks, it was couched in other checks, but the more atomic the better, IMO. Davidrw, THANKS! I've seen that but forgot about it.

Replies are listed 'Best First'.
Re: hex-only regex
by davidrw (Prior) on Jan 17, 2007 at 16:55 UTC
    TMTOWTDI -- Regexp::Common::number
    use Regexp::Common qw /number/; while(<DATA>){ chomp; print /\A$RE{num}{hex}\z/ ? 'YES' : 'NO '; print " $_\n"; } __DATA__ 12345 abc1234 foo123 ab123CD
Re: hex-only regex
by kyle (Abbot) on Jan 17, 2007 at 16:16 UTC

    Your first regex might be better if you force it to match the whole string:

    if ($hash =~ /^[0-9a-f]{32}$/i) { return(1); }

    This way, it doesn't match a good hash that also has some other junk with it. My guess is that you may be checking the string's length separately, in which case this isn't really necessary, but it might be nice to consolidate those checks.

    I don't see anything wrong with your other pattern.

      Don't fall for the "dollar mistake" here. Remember that $ doesn't match end of string: it matches "end of string or just before a newline at end of string".

      This means that you could have "123423453456456756786789789a89ab\n" in your string, and it'd still match. While this may not make any difference for your application, in some cases this could be a missed crucial check for a security validation, allowing a messy character where it doesn't belong (such as in a filename).

      Beware the dollar. Use \z instead: /^[0-9a-f]{32}\z/i.

        Fascinating. I need to read perlre more often.

        Does m/regular$/s also work reliably or is \z the only safe way to do it? Until today, I've always used the former to change the meaning of $. UPDATE: Nope. It's not reliable at all. In fact, I'd go so far as to say it doesn't even work.

        -Paul

Re: hex-only regex
by Joost (Canon) on Jan 17, 2007 at 15:38 UTC
Re: hex-only regex
by fenLisesi (Priest) on Jan 18, 2007 at 11:24 UTC