http://qs1969.pair.com?node_id=213392

Angel has asked for the wisdom of the Perl Monks concerning the following question:

I have been trying to learn regex and well I am having some trouble:

I though this would return true if any chracters other than the ones in the brackets were returned:

=~ /^[A-Za-z0-9]/

but I try something like æ ( hlding down alt and pressing 145 ) and it does not trip the fucntion. If I am trying to get a true value ( for error checking ) am I doing this the right way?

Angel

update (broquaint): added <code> tags

Replies are listed 'Best First'.
Re: Regex help
by strider corinth (Friar) on Nov 16, 2002 at 13:59 UTC
    You've got it almost right, from what I can see. Your post looks a little strange (you'll notice your brackets are missing) because PerlMonks uses brackets to make linking easier in posts (anything in brackets links to the node named whatever's within the brackets). In the future, you'll probably want to put <code> tags around any code you post.

    From what I can see, your regexp looks like this: /^[A-Za-z0-9]/ There's only one small error- the ^ is outside of the brackets. A caret outside of brackets means 'match this regexp at the beginning of a string'. If you move it inside: /[^A-Za-z0-0]/ you should find that it does what you want it to.

    Incidentally, there are two shortcuts you might be interested in. You can use the escape sequence "\w" to mean "match all alphanumeric characters plus '_'", and "\W" to mean "match everything but \w". So if it's ok for you to add "_" to the class of characters that are ok to find, the whole thing can be shortened to /\W/.

    A good place to look for more information on this is the perlre man page. Happy Perling!
    --
    Love justice; desire mercy.
Re: Regex help
by mikeirw (Pilgrim) on Nov 16, 2002 at 13:54 UTC

    What you want is:

    =~ /[^A-Za-z0-9]/

    Which could also be written as:

    =~ /\W/

    ...which would also include the underscore in addition to the alphanumeric characters. See perlre for more info.

Re: Regex help
by jreades (Friar) on Nov 16, 2002 at 13:58 UTC

    If you want anything that isn't a digit or a leter (or an underscore), then you might consider using: \W

    while ($in =~ /\W/) { # do something }

    Or:

    while ($in !~ /\w/) { ... }

    Otherwise:

    while ($in =~ /[^a-zA-Z0-9]/) { # do something }

    Or:

    while ($in !~ /[a-zA-Z0-9]/) { ... }

    There's not much to chose between =~ and !~, but I generally like the emphasis of saying "as long as it's not one of these things..."

      Carefull with logic!

      $in =~ /\W/ # is NOT the same as $in !~ /\w/
      --
      http://fruiture.de
Re: Regex help
by Anonymous Monk on Nov 16, 2002 at 22:02 UTC
    =~ /[^A-Za-z0-9]/ The carrot should be inside the brackets not outside. When it's outside the brackets perl tries to find A-Za-z0-9 in the beginning of words. If it's in the brackets it tells perl to find ANY characters besides the ones in the brackets.

    Fixed square brackets - dvergin 2002-11-16

Re: Regex help
by Anonymous Monk on Nov 18, 2002 at 09:08 UTC
    the carret "^" must be within bracket angles:

    /[^A-Za-z0-9]/ .... for single character
    /[^A-Za-z0-9]+/ .... for multiple characters

    --- PS:
    your example matched single character atd begin of line

Re: Regex help
by Ananda (Pilgrim) on Nov 18, 2002 at 11:04 UTC

    Hello Angel,

    From the query you have posted i observe the following.

    The function is tripping because the ^(caret) is outside the brackets([]) They need to be inside.

    Using the æ for testing is perfectly ok the regex performs perfectly, once the caret is placed within the brackets.

    my $in = "æ";

    if ($in !~ /\w/) {print "contains non alpha numeric characters!!"; } else {print 'its all OK (_)';}

    This should clarify your doubts. Cheers!!!

    Anandatirtha