in reply to Regex for weird characters

The easiest way is not to try to look for all the weird characters, but instead to look for only the permitted characters and invert the test. If, for example, you only permit letters, numbers and commas, you would do something like ...
if($text !~ /^[a-z0-9,]*$/i) { die "bad characters\n"; }
Doing it this way ensures that you only permit what you want. This is good practice, because you won't accidentally let bad stuff through. If you try to think of all the weird characters that someone can type you're *bound* to forget some of them. For example, you're probably thinking of banning é, è, ñ and ç. But had you remembered æ, œ, ß, ð, ø, å, ł and þ?

Replies are listed 'Best First'.
Re^2: Regex for weird characters
by Skeeve (Parson) on Sep 27, 2004 at 15:06 UTC
    The idea is okay, but why match the whole string if it's sufficiant to fail as soon as one character fails? So instead of
    $text !~ /^[a-z0-9,]*$/i
    this should give the same result:
    $text =~ /[^a-z0-9,]/i

    $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print

      you could shorten Skeeve's regex further using character classes, acutally it's the same length cause I added spaces ;)

      $text =~ /[^\w\d\s,]/;

      Update: switched /w for \w (++ to graff for pointing out my typo)

      "Cogito cogito ergo cogito sum - I think that I think, therefore I think that I am." Ambrose Bierce

        You absolutely can not use perl's shortcut character classes. \w matches different things depending on your locale. If you have a German locale, for instance, it will match ß.
Re^2: Regex for weird characters
by sulfericacid (Deacon) on Sep 27, 2004 at 15:23 UTC
    This goes with what DrHyde was showing with as a bunch of the normal characters you're looking for. It might take more characers depending on what else you want to add, but here you go.
    if($text !~ m/^[a-z0-9,!,@,#,$,%,^,&,*,9,\,,\.\?,\~,:,\+,\-, ,\"]*$/i)


    "Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

    sulfericacid
      Why do you have so many commas in your character class?
        Aren't you supposed to comma separate unsimilar items like that? I don't use regexes all that often but I did test the script and it worked.


        "Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

        sulfericacid