Re: Regexp for alphabetical order match within the string
by hardburn (Abbot) on Oct 30, 2003 at 18:56 UTC
|
# $str defined elsewhere
print "Is alphabetical\n"
if( $str eq join '', sort split //, $str );
---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
-- Schemer
: () { :|:& };:
Note: All code is untested, unless otherwise stated
| [reply] [d/l] [select] |
|
sub is_alphabetical {
my @c = split //, shift;
ord($c[$_]) >= ord($c[$_ - 1]) or return 0 for 1 .. $#c;
return 1;
}
Update: Yes, my use of ord() is unnecessary. Thanks tlhf++.
Edit: Added a missing my.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |
|
Elegent solution, tho the ords are needless.
sub is_alphabetical {
my @c = split //, shift;
$c[$_] ge $c[$_-1] or return 0 for 1..$#c;
return 1;
}
tlhf
(Everyone forgets about ge and le ^_^) | [reply] [d/l] [select] |
|
$str = "aBcD12ef7812g";
print is_alphabetical($str);
sub is_alphabetical {
(my $str = lc(shift)) =~ s/([^a-z])//g;
my @c = split //, $str;
$c[$_] ge $c[$_ - 1] or return 0 for 1 .. $#c;
return 1;
}
| [reply] [d/l] |
|
|
|
But this won't take case into consideration, would it? "Ba" is not really alphabetical in my book.
| [reply] |
|
For case-insensitivity just throw an lc before the shift.
| [reply] [d/l] [select] |
Re: Regexp for alphabetical order match within the string
by sgifford (Prior) on Oct 30, 2003 at 19:53 UTC
|
How about ! /(.).*(??{"[^$1-z]"})/ix?
#!/usr/bin/perl -w
use strict;
if ($ARGV[0] !~ /(.).*(??{"[^$1-z]"})/ix)
{
print "alpha\n";
}
else
{
print "non-alpha\n";
}
__END__
[sgifford@sglaptop sgifford]$ perl /tmp/t4 abcxz
alpha
[sgifford@sglaptop sgifford]$ perl /tmp/t4 abcda
non-alpha
[sgifford@sglaptop sgifford]$ perl /tmp/t4 aaaaaaab
alpha
[sgifford@sglaptop sgifford]$ perl /tmp/t4 aaaaaaabcccccccz
alpha
[sgifford@sglaptop sgifford]$ perl /tmp/t4 aaaaaaabcccccdccz
non-alpha
[sgifford@sglaptop sgifford]$ perl /tmp/t4 aaaaaaaaaaaaa
alpha
[sgifford@sglaptop sgifford]$ perl /tmp/t4 abcdefghijklmnopqrstuvwxyz
alpha
| [reply] [d/l] [select] |
|
Very clever, -10. (I got that on an assignment once.)
Surprisingly, your expression is not case-insensitive, despite the /i switch. That is because you can end up with character classes like [B-z]. One possible remedy: /(.)(??{"[^lc($1)-z]"})/. (Also note that you don't need the .*).
The other thing about your regex is that it is about 30 times slower than the other recommendations. Don't even try to benchmark it for more than a few thousand iterations. But it is cool.
The other solutions are all comparable to each other for performance. My benchmarking code attached.
| [reply] [d/l] [select] |
|
sub is_sorted{
my( $str, $p, $x ) = ( lc shift, 0 );
chop$x lt $x and return 0
while $x = substr $str, $p++, 2;
1;
};
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
Hooray!
| [reply] [d/l] |
|
|
Urg. Could you explain that? I'm not following the ??{"[^$1-z]"}) bit.
| [reply] [d/l] |
|
OK, here's what perlre(1) says about (??{}):
""(??{ code })""
WARNING: This extended regular expression fea-
ture is considered highly experimental, and may
be changed or deleted without notice. A simpli-
fied version of the syntax may be introduced for
commonly used idioms.
This is a "postponed" regular subexpression.
The "code" is evaluated at run time, at the
moment this subexpression may match. The result
of evaluation is considered as a regular expres-
sion and matched as if it were inserted instead
of this construct.
So the RE first matches any character $1, followed by zero or more of any character. Then it evaluates the code in the (??{}). This code evaluates to a character class for a character outside of the range [$1-z]---a character earlier in the alphabet.
If you change the code to print what it's trying, it's clearer what's going on:
if ($ARGV[0] !~ /(.).*(??{print "Searching for [^$1-z] starting at ",p
+os($ARGV[0]),"\n"; "[^$1-z]"})/ix)
produces
[sgifford@sglaptop sgifford]$ perl /tmp/t4 abcd
Searching for [^a-z] starting at 4
Searching for [^a-z] starting at 3
Searching for [^a-z] starting at 2
Searching for [^a-z] starting at 1
Searching for [^b-z] starting at 4
Searching for [^b-z] starting at 3
Searching for [^b-z] starting at 2
Searching for [^c-z] starting at 4
Searching for [^c-z] starting at 3
Searching for [^d-z] starting at 4
alpha
Other solutions are more efficient, but the OP asked for an RE. :-)
| [reply] [d/l] [select] |
|
Re: Regexp for alphabetical order match within the string
by runrig (Abbot) on Oct 30, 2003 at 19:01 UTC
|
You might want to at least make that easier to generate, and anchor the regex (otherwise you'll always get a match): my $str = join '', map "$_*", "a".."z";
my $re = qr/^$str$/;
print "Matches\n" if "abc" =~ $re;
print "Doesn't match\n" unless "zbc" =~ $re;
| [reply] [d/l] |
|
And perhaps make the regex case-insensitive.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
Re: Regexp for alphabetical order match within the string
by duff (Parson) on Oct 30, 2003 at 20:07 UTC
|
Just to be different, here's a solution that doesn't use REs or split():
sub isalphabetical {
for my $i (0..(length($_[0])-2)) {
return 0 if lc(substr($_[0],$i,1)) gt lc(substr($_[0],$i+1,1));
}
return 1;
}
| [reply] [d/l] |
|
my (off the top of my head) comment is:
don't use regex...convert each character into ASCI or your "true" alphabetically mapped sequence number (eg. aA->1 etc.) and loop through the string, keeping the highest number seen in a variable. As soon as you hit a lower value, exit the loop and print "not alphabetical". This should beat regexp for longer/complex strings, and you can make your own map of exactly what "alphabetical" is (in Chinese if you like). The solution ends up more portable, flexible (not tied to regex syntax), and runs faster. Sorry, but am about to disembark and go home, so i'll post some code later.
| [reply] |
Re: Regexp for alphabetical order match within the string
by Abigail-II (Bishop) on Oct 31, 2003 at 10:21 UTC
|
You sound as if you think that
/a*b*c*...y*z*/
isn't an acceptable solution. I think that once you have placed anchors, it's a very fine solution. The regex might
be long, it's simple. There are no alternatives.
There's no backreferencing. There are no delayed regexes.
There will be no backtracking for matching strings, and only
minimal backtracking in case of failures. Remember that the
less alternatives you give the regexp engine, the faster it
(usually) be.
Abigail | [reply] [d/l] |
|
/^(?>a*b*c*...y*z*)$/
| [reply] [d/l] |