Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I'm not stumped; I'm dumbfounded. Why does $str =~ m/([^?-~])/ match "&" where $str = 'asdf&asdf'? This is reduced version of a regex I found in HTML::Entities.

thanks mik

Replies are listed 'Best First'.
Re: [^?-~]
by thelenm (Vicar) on Aug 09, 2002 at 23:33 UTC
    The regular expression /[^?-~]/ matches a character that is not in the range "?" (ascii value 63) through "~" (ascii value 126). The character "&" is ascii value 38, outside the range. So the regex matches it. Note that "a", "s", "d", and "f" are all inside the range, so they are not matched.

    -- Mike

    --
    just,my${.02}

      Ooohhh, very cool. "-" isn't a character it's a range specifier. Now it makes sense. thanks.

        Inside a character class, '-' is a character only when it appears at the beginning (or right after the negation operator).

Re: [^?-~]
by sauoq (Abbot) on Aug 09, 2002 at 23:37 UTC
    This is a character class. The "-" specifies a range of characters. Specifically, it specifies the range from "?" to "~" or from ASCII 0x3f to ASCII 0x7e. The "^" specifies the complement of that range. I.e. any characters that are not in the range. As '&' is ASCII 0x26 and not in the range, it matches.

    Update:Yeah, what he said!
    *sigh* I guess I was a little slow. :-)

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: [^?-~]
by MrNobo1024 (Hermit) on Aug 09, 2002 at 23:38 UTC
    The character class [^?-~] matches all characters that are not between ? and ~. & has a lower value than ? so it matches.

    --MrNobo1024
    s]]HrLfbfe|EbBibmv]e|s}w}ciZx^RYhL}e^print

Re: Matching [^?-~]
by Abigail-II (Bishop) on Aug 12, 2002 at 12:01 UTC
    Others have already explained how it works, but I would like to point out that this is not portable. It only works this way on machines that use ASCII (or a superset of ASCII, like ISO-8859-1) as their native character set. But Perl runs also on EBCIDIC platforms, which has a different character set, and ordering of the characters as ASCII has.

    If HTML::Entities uses such a regex, then that's a bad thing.

    Abigail