Re: This regexp made simpler
by AnomalousMonk (Archbishop) on Apr 25, 2010 at 11:02 UTC
|
>perl -wMstrict -le
"for (@ARGV) {
if(/^A (?: Z | (\s.*?)) Z$/x) {
my $grabbed = $1 // '';
print qq{matched '$_' grabbed '$grabbed'};
}
}
" AZ AZZ AXZ "A SOMETHING Z" ASOMETHINGZ
matched 'AZZ' grabbed ''
matched 'A SOMETHING Z' grabbed ' SOMETHING '
I wonder why it is necessary to match something like 'AZZ' and yet grab an undefined value from it, which must later be rationalized Update: to an empty string. (Additionally, the regex Update: first regex
of the OP does not match 'AZ', which seems to be required by the OP.)
Wouldn't it make more sense only to grab stuff from strings that match? E.g., "if there is anything between A and Z, it must begin with a space and be followed by zero or more non-Z characters". (Has the advantage of matching 'AZ', no defined test needed.)
>perl -wMstrict -le
"for (@ARGV) {
if(/^A ((?: \s [^Z]*)?) Z$/x) {
print qq{matched '$_' grabbed '$1'};
}
}
" AZ AZZ AXZ "A ZZ" "A SOMETHING Z" ASOMETHINGZ "A Z" "A Z"
matched 'AZ' grabbed ''
matched 'A SOMETHING Z' grabbed ' SOMETHING '
matched 'A Z' grabbed ' '
matched 'A Z' grabbed ' '
Updates:
-
However, the 'Z' still needs to be repeated in the regex! Oh, well...
-
Added "A Z" and "A Z" test cases to my solution.
| [reply] [d/l] [select] |
|
|
I wonder why it is necessary to match something like 'AZZ' and yet grab an undefined value from it.
Good point. This made me rethink my problem. In my case, the grabbed part is not really kept in a variable (I wrote it in that way in the hope to make the whole posting simpler), but within a substituion (to be precise, an insertion): I need to change a text AXZ into AXIZ, where the X is optional. In otherwords, I have to insert I in front of the Z, so in the substitution I use
s/..../A$1IZ/
, and if I know that $1 is always defined, I don't have to care about interpolating an undefined value. In hindsight, I now see that I should better have written
s/^(A(?:\s.*?)?(Z))/$1I$2/
. :-(
--
Ronald Fischer <ynnor@mm.st>
| [reply] [d/l] [select] |
Re: This regexp made simpler
by BrowserUk (Patriarch) on Apr 25, 2010 at 11:08 UTC
|
printf( "\n$_: " ),
m[A( [^Z]+)Z] and print "'$1'"
for 'AZ', 'A SOMETHING Z', 'ASOMETHINGZ', 'A Z';;
AZ:
A SOMETHING Z: ' SOMETHING '
ASOMETHINGZ:
A Z: ' '
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
But that doesn't match 'AZ', which the OP seems to require, and also doesn't match 'A Z' (single space between first and final characters), which also seems to be required.
| [reply] |
|
|
A simple variation fixes that:
/A( [^Z]*)?Z/
It surprises me how many monks in this thread seem to think that expressing the "no Z between ..." condition with .*? is a good idea.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
|
|
printf( "\n$_: " ),
m[A( [^Z]*|)Z] and print "'$1'"
for 'AZ', 'A SOMETHING Z', 'ASOMETHINGZ', 'A Z', 'A Z';;
AZ: ''
A SOMETHING Z: ' SOMETHING '
ASOMETHINGZ:
A Z: ' '
A Z: ' '
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: This regexp made simpler
by FunkyMonk (Bishop) on Apr 25, 2010 at 10:42 UTC
|
Not extensively tested, but does
my @strings = ('AZ', 'A SOMETHING Z', 'ASOMETHINGZ', 'A Z', 'A ZZ', '
+AA ZZ', 'AAZZ');
for (@strings) {
if (/A( .*?)?Z/) {
my $grabbed = $1 // '';
say "'$_' grabbed '$grabbed'";
}
else { say "'$_' did not match" }
}
__END__
'AZ' grabbed ''
'A SOMETHING Z' grabbed ' SOMETHING '
'ASOMETHINGZ' did not match
'A Z' grabbed ' '
'A ZZ' grabbed ' '
'AA ZZ' grabbed ' '
'AAZZ' grabbed ''
do what you want?
Update
What should 'A ZZ', 'AAZZ' and 'AA ZZ' match? (added these as test cases)
| [reply] [d/l] |
|
|
Contrary to my interpretation of the requirements of the OP, both your regex and the updated regex of rovf's OP allow a 'Z' between the first 'A' and the final 'Z', and also still need to have an undefined $1 rationalized to an empty string.
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
Updating my post to accommodate the anchors and your update:
my @strings = ('AZ', 'A SOMETHING Z', 'ASOMETHINGZ', 'A Z', 'A ZZ', '
+AA ZZ', 'AAZZ', 'A Z');
for (@strings) {
if (/^A( [^Z]*)?Z$/) {
my $grabbed = $1 // '';
say "'$_' grabbed '$grabbed'";
}
else { say "'$_' did not match" }
}
__END__
'AZ' grabbed ''
'A SOMETHING Z' grabbed ' SOMETHING '
'ASOMETHINGZ' did not match
'A Z' grabbed ' '
'A ZZ' did not match
'AA ZZ' did not match
'AAZZ' did not match
'A Z' grabbed ' '
| [reply] [d/l] |
|
|
I should have taken more notice of the anchors in your OP :(
| [reply] |
Re: This regexp made simpler
by rubasov (Friar) on Apr 25, 2010 at 12:48 UTC
|
More variations on the theme (if I got it right - the Z is repeated though). The second one does not use captures at all.
while (<DATA>) {
print;
#s/^A(|\s[^Z]*)Z$/A$1IZ/;
s/^A(?:|\s[^Z]*)\K(?=Z$)/I/;
print;
}
__DATA__
AZ
AZZ
A SOMETHING Z
ASOMETHINGZ
A Z
A ZZ
AAZZ
AA ZZ
| [reply] [d/l] |
Re: This regexp made simpler
by Marshall (Canon) on Apr 27, 2010 at 21:26 UTC
|
Another way to go using rubasov's data set plus a completely illegal line (ZA). The below is more "wordy" than other solutions, but I think what it does and how it does it is clear. If for example, RESULT=" " should be disallowed, there is a clear place to do that modification.
#!/usr/bin/perl -w
use strict;
while (<DATA>)
{
chomp;
my $result = is_match($_);
defined($result) ? print "$_:\tRESULT=\"$result\"\n"
: print "$_:\tRESULT=NO MATCH\n";
}
sub is_match
{
my $term = shift;
my $inner = ($term =~ m/^A(.*)Z$/)[0];
return undef if (!defined($inner));
return $inner if $inner eq "";
return $inner if $inner =~ m/^\s/;
return undef;
}
=prints:
AZ: RESULT=""
AZZ: RESULT=NO MATCH
A SOMETHING Z: RESULT=" SOMETHING "
ASOMETHINGZ: RESULT=NO MATCH
A Z: RESULT=" "
A ZZ: RESULT=" Z"
AAZZ: RESULT=NO MATCH
AA ZZ: RESULT=NO MATCH
ZA: RESULT=NO MATCH
=cut
__DATA__
AZ
AZZ
A SOMETHING Z
ASOMETHINGZ
A Z
A ZZ
AAZZ
AA ZZ
ZA
Update: I looked at the OP's spec again and it appears that this tweaking of is_match() would be better?:
sub is_match
{
my $term = shift;
my $inner = ($term =~ m/^A(.*)Z$/)[0];
return undef if (!defined($inner));
return undef if $inner eq "";
return $inner if $inner =~ m/^\s+\S/;
return undef;
}
prints:.....
AZ: RESULT=NO MATCH
AZZ: RESULT=NO MATCH
A SOMETHING Z: RESULT=" SOMETHING "
ASOMETHINGZ: RESULT=NO MATCH
A Z: RESULT=NO MATCH
A ZZ: RESULT=" Z"
AAZZ: RESULT=NO MATCH
AA ZZ: RESULT=NO MATCH
ZA: RESULT=NO MATCH
| [reply] [d/l] [select] |