regular expression-xerox

oz has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to translate the perl regular expression to xerox regular expression (http://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/fssyntax.html).
I need to replace all [ABCD] kind of regex into A|B|C|D

I would appreciate any help.
thanks in advance

Edit by BazB. add formatting, code tags and linkify URL

Comment on regular expression-xerox Select or Download Code

Replies are listed 'Best First'.
Re: regular expression-xerox by diotalevi (Canon) on May 03, 2004 at 18:01 UTC
You seem to have already figured out that `[ABCD]` is semi-equivalent to `(A\|B\|C\|D)`. What do you need help with? It is more properly equivalent to `(?:A\|B\|C\|D)`. The difference is that (?: ... ) is strictly for grouping and alternation while ( ... ) also captures its contents into a variable that can be accessed with a number like $1, $2, $3, etc.	[reply] [d/l] [select]
Re: Re: regular expression-xerox by oz (Novice) on May 04, 2004 at 09:59 UTC
Because I need to translate it in another regular expression format.	[reply]
Re: Re: Re: regular expression-xerox by diotalevi (Canon) on May 04, 2004 at 11:32 UTC
I fail to see the problem. What part of the problem involving writing `[ABCD]` as `(A\|B\|C\|D)` escapes you?	[reply] [d/l] [select]
Re: Re: Re: Re: regular expression-xerox by oz (Novice) on May 04, 2004 at 18:01 UTC
Re: Re: Re: Re: Re: regular expression-xerox by diotalevi (Canon) on May 04, 2004 at 19:26 UTC
Re: regular expression-xerox by Fletch (Bishop) on May 03, 2004 at 18:03 UTC
Of course to do this you're going to need to parse the perl regexp to begin with. Take a look at YAPE::Regex. That'll get you a parse tree you can walk and munge into your other format.	[reply]
Re: regular expression-xerox by kvale (Monsignor) on May 03, 2004 at 18:01 UTC
Character classes like `[ABCD]` can be converted to alternation as follows: `my $class = 'ABCD'; my $xerox = join '\|', split //, $class; # create alternation $xerox = '(?:' . $xerox . ')'; # non-capturing grouping` [download] -Mark	[reply] [d/l] [select]
Re: regular expression-xerox by Abigail-II (Bishop) on May 03, 2004 at 21:09 UTC
Not quite. You haven't considered: Characters that have a special meaning, like `-`, `^`, and `]` (and that meaning is position dependent!) Characters that inside a character class don't have a special meaning, but have one outside the class, like `+`, `?`, `*` and others. POSIX character class syntax. Abigail	[reply] [d/l]
Re: Re: regular expression-xerox by kvale (Monsignor) on May 03, 2004 at 21:42 UTC
My solution is correctly answers the particular requirement the OP stated: convert the character class 'ABCD' to a form that uses alternation. If one extrapolates that requirement to all alphanumerics, them my type of solution still works. If one exptrapolates to metacharacters like those in 1. and 2., or to predefined POSIX classes or Unicode characters, as in 3., then obviously the parser and translator must be extended to handle these situations. But for the simple requirements stated by the OP, a simple solution is best. -Mark	[reply]
Re: Re: regular expression-xerox by oz (Novice) on May 04, 2004 at 10:14 UTC
May I ask what does ? mean in the regular expression. I can not use ? in the language I am translating since it has already a meaning- which is any character. One other question can't it be done with a substitution routine because I need to globally change each occurence of [] to \| in the regular expression. And one note my character classes include only capital letters as the simplest example i give. thanks to everyone offering help:)	[reply]
Re: regular expression-xerox by fletcher_the_dog (Friar) on May 03, 2004 at 19:03 UTC
If your character classes have any character ranges in it, then you are going to have to make sure that you account for that. For example if you have "A-D" then you will want to convert that to "(?:A\|B\|C\|D)" and not "(?:A\|-\|D)".	[reply]

Back to Seekers of Perl Wisdom