Re: regular expression-xerox
by diotalevi (Canon) on May 03, 2004 at 18:01 UTC
|
You seem to have already figured out that [ABCD] is semi-equivalent to (A|B|C|D). What do you need help with? It is more properly equivalent to (?:A|B|C|D). The difference is that (?: ... ) is strictly for grouping and alternation while ( ... ) also captures its contents into a variable that can be accessed with a number like $1, $2, $3, etc.
| [reply] [d/l] [select] |
|
Because I need to translate it in another regular expression format.
| [reply] |
|
| [reply] [d/l] [select] |
|
|
Re: regular expression-xerox
by Fletch (Bishop) on May 03, 2004 at 18:03 UTC
|
Of course to do this you're going to need to parse the perl regexp to begin with. Take a look at YAPE::Regex. That'll get you a parse tree you can walk and munge into your other format.
| [reply] |
Re: regular expression-xerox
by kvale (Monsignor) on May 03, 2004 at 18:01 UTC
|
Character classes like [ABCD] can be converted
to alternation as follows:
my $class = 'ABCD';
my $xerox = join '|', split //, $class; # create alternation
$xerox = '(?:' . $xerox . ')'; # non-capturing grouping
| [reply] [d/l] [select] |
|
Not quite. You haven't considered:
- Characters that have a special meaning, like -, ^, and ] (and that meaning is position dependent!)
- Characters that inside a character class don't have a special meaning, but have one outside the class, like +, ?, * and others.
- POSIX character class syntax.
Abigail
| [reply] [d/l] |
|
My solution is correctly answers the particular requirement the OP stated: convert the character class 'ABCD' to a form that uses alternation. If one extrapolates that requirement to all alphanumerics, them my type of solution still works.
If one exptrapolates to metacharacters like those in 1. and 2., or to predefined POSIX classes or Unicode characters, as in 3., then obviously the parser and translator must be extended to handle these situations.
But for the simple requirements stated by the OP, a simple solution is best.
| [reply] |
|
May I ask what does ? mean in the regular expression.
I can not use ? in the language I am translating since it has already a meaning- which is any character.
One other question can't it be done with a substitution routine because I need to globally change each occurence of [] to | in the regular expression.
And one note my character classes include only capital letters as the simplest example i give.
thanks to everyone offering help:)
| [reply] |
Re: regular expression-xerox
by fletcher_the_dog (Friar) on May 03, 2004 at 19:03 UTC
|
If your character classes have any character ranges in it, then you are going to have to make sure that you account for that. For example if you have "A-D" then you will want to convert that to "(?:A|B|C|D)" and not "(?:A|-|D)".
| [reply] |