Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Regexp for alphabetical order match within the string

by prostoalex (Scribe)
on Oct 30, 2003 at 18:46 UTC ( [id://303353]=perlquestion: print w/replies, xml ) Need Help??

prostoalex has asked for the wisdom of the Perl Monks concerning the following question:

I am looking for a regexp capable of telling me whether the letters in a given string (we'll assume no spaces) are in alphabetical order. So that "abcxz" will match, but "abcda" won't.

So far the best example I could come up with is:

/a*b*...y*z*/
Thanks.

Edit, BazB: fix typo in title.

Replies are listed 'Best First'.
Re: Regexp for alphabetical order match within the string
by hardburn (Abbot) on Oct 30, 2003 at 18:56 UTC

    Do you really need it to be a regex?

    # $str defined elsewhere print "Is alphabetical\n" if( $str eq join '', sort split //, $str );

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      You can avoid the sort for an O(N) solution:

      sub is_alphabetical { my @c = split //, shift; ord($c[$_]) >= ord($c[$_ - 1]) or return 0 for 1 .. $#c; return 1; }

      Update: Yes, my use of ord() is unnecessary. Thanks tlhf++.

      Edit: Added a missing my.

      -sauoq
      "My two cents aren't worth a dime.";
      
        Elegent solution, tho the ords are needless.
        sub is_alphabetical { my @c = split //, shift; $c[$_] ge $c[$_-1] or return 0 for 1..$#c; return 1; }

        tlhf
        (Everyone forgets about ge and le ^_^)

        ++, but the original post obviously allows non-alphabetical chars in the string, and their orders are not considered. A small improvement to meet the original requirement:

        $str = "aBcD12ef7812g"; print is_alphabetical($str); sub is_alphabetical { (my $str = lc(shift)) =~ s/([^a-z])//g; my @c = split //, $str; $c[$_] ge $c[$_ - 1] or return 0 for 1 .. $#c; return 1; }
      But this won't take case into consideration, would it? "Ba" is not really alphabetical in my book.
        For case-insensitivity just throw an lc before the shift.
Re: Regexp for alphabetical order match within the string
by sgifford (Prior) on Oct 30, 2003 at 19:53 UTC
    How about ! /(.).*(??{"[^$1-z]"})/ix?
    #!/usr/bin/perl -w use strict; if ($ARGV[0] !~ /(.).*(??{"[^$1-z]"})/ix) { print "alpha\n"; } else { print "non-alpha\n"; } __END__ [sgifford@sglaptop sgifford]$ perl /tmp/t4 abcxz alpha [sgifford@sglaptop sgifford]$ perl /tmp/t4 abcda non-alpha [sgifford@sglaptop sgifford]$ perl /tmp/t4 aaaaaaab alpha [sgifford@sglaptop sgifford]$ perl /tmp/t4 aaaaaaabcccccccz alpha [sgifford@sglaptop sgifford]$ perl /tmp/t4 aaaaaaabcccccdccz non-alpha [sgifford@sglaptop sgifford]$ perl /tmp/t4 aaaaaaaaaaaaa alpha [sgifford@sglaptop sgifford]$ perl /tmp/t4 abcdefghijklmnopqrstuvwxyz alpha
      Very clever, -10. (I got that on an assignment once.)

      Surprisingly, your expression is not case-insensitive, despite the /i switch. That is because you can end up with character classes like [B-z]. One possible remedy: /(.)(??{"[^lc($1)-z]"})/. (Also note that you don't need the .*).

      The other thing about your regex is that it is about 30 times slower than the other recommendations. Don't even try to benchmark it for more than a few thousand iterations. But it is cool.

      The other solutions are all comparable to each other for performance. My benchmarking code attached.

        Just for grins, this tests about 50% quicker than the substr version in your benchmark.

        sub is_sorted{ my( $str, $p, $x ) = ( lc shift, 0 ); chop$x lt $x and return 0 while $x = substr $str, $p++, 2; 1; };

        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        Hooray!

      Urg. Could you explain that? I'm not following the ??{"[^$1-z]"}) bit.
        OK, here's what perlre(1) says about (??{}):
               ""(??{ code })""
                         WARNING: This extended regular expression fea-
                         ture is considered highly experimental, and may
                         be changed or deleted without notice.  A simpli-
                         fied version of the syntax may be introduced for
                         commonly used idioms.
        
                         This is a "postponed" regular subexpression.
                         The "code" is evaluated at run time, at the
                         moment this subexpression may match.  The result
                         of evaluation is considered as a regular expres-
                         sion and matched as if it were inserted instead
                         of this construct.
        

        So the RE first matches any character $1, followed by zero or more of any character. Then it evaluates the code in the (??{}). This code evaluates to a character class for a character outside of the range [$1-z]---a character earlier in the alphabet.

        If you change the code to print what it's trying, it's clearer what's going on:

        if ($ARGV[0] !~ /(.).*(??{print "Searching for [^$1-z] starting at ",p +os($ARGV[0]),"\n"; "[^$1-z]"})/ix)
        produces
        [sgifford@sglaptop sgifford]$ perl /tmp/t4 abcd Searching for [^a-z] starting at 4 Searching for [^a-z] starting at 3 Searching for [^a-z] starting at 2 Searching for [^a-z] starting at 1 Searching for [^b-z] starting at 4 Searching for [^b-z] starting at 3 Searching for [^b-z] starting at 2 Searching for [^c-z] starting at 4 Searching for [^c-z] starting at 3 Searching for [^d-z] starting at 4 alpha

        Other solutions are more efficient, but the OP asked for an RE. :-)

Re: Regexp for alphabetical order match within the string
by runrig (Abbot) on Oct 30, 2003 at 19:01 UTC
    You might want to at least make that easier to generate, and anchor the regex (otherwise you'll always get a match):
    my $str = join '', map "$_*", "a".."z"; my $re = qr/^$str$/; print "Matches\n" if "abc" =~ $re; print "Doesn't match\n" unless "zbc" =~ $re;
      And perhaps make the regex case-insensitive.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Regexp for alphabetical order match within the string
by duff (Parson) on Oct 30, 2003 at 20:07 UTC

    Just to be different, here's a solution that doesn't use REs or split():

    sub isalphabetical { for my $i (0..(length($_[0])-2)) { return 0 if lc(substr($_[0],$i,1)) gt lc(substr($_[0],$i+1,1)); } return 1; }
      my (off the top of my head) comment is:
      don't use regex...convert each character into ASCI or your "true" alphabetically mapped sequence number (eg. aA->1 etc.) and loop through the string, keeping the highest number seen in a variable. As soon as you hit a lower value, exit the loop and print "not alphabetical". This should beat regexp for longer/complex strings, and you can make your own map of exactly what "alphabetical" is (in Chinese if you like). The solution ends up more portable, flexible (not tied to regex syntax), and runs faster. Sorry, but am about to disembark and go home, so i'll post some code later.
Re: Regexp for alphabetical order match within the string
by Abigail-II (Bishop) on Oct 31, 2003 at 10:21 UTC
    You sound as if you think that
    /a*b*c*...y*z*/
    isn't an acceptable solution. I think that once you have placed anchors, it's a very fine solution. The regex might be long, it's simple. There are no alternatives. There's no backreferencing. There are no delayed regexes. There will be no backtracking for matching strings, and only minimal backtracking in case of failures. Remember that the less alternatives you give the regexp engine, the faster it (usually) be.

    Abigail

      I would suggest that you enclose the whole thing in the non-backtracking setup (?>pattern). Since if it doesn't match at first it won't match at all, you can speed up failure by enclosing the whole thing in (?> and ).


      /^(?>a*b*c*...y*z*)$/

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://303353]
Approved by blokhead
Front-paged by PERLscienceman
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-20 10:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found