Well, personally what youve posted seems to raise more questions than answers.
I have a few thoughts though, which i guess ill start with the regex that you used to describe your data, and the sample data you provided.
Regex is like ((AND|OR)([!=><]+)(.*))+
Input is like $check = "AND=>1536463OR<foobarOR=5";
My first question comes from looking at the two together. Your regex describes some of the following strings:
AND!==!<!>>!!LKJKJIOJJ182873KLJJyuukjljkOR!<><><><><=!Blah
OR==!=!Hmm, could this be right?AND>>>>>>>this could be a problem
I think my point is taken. :-) So then we look at the data. You didnt really say what was supposed to happen. IS this supposed to produce the following triplets
AND,=>,1536463
OR,<,foobar
OR,=,5
Or was it supposed to reject it? (Its not clear from the conversation I saw on the chatterbox, nor from your post)
So going back to my first point I assume that you nead to handle the basic relational operators? ie
= => =< == > < >= <= != <>
Off the top of my head that becomes
(=[><=]?|[><]=?|!=|<>)
So then we already have the first part, (AND|OR), which leaves the last. Now this comes my second intrepretation of your question. How do I keep .* from eating more than it should?
The way to solve this is figure out what the dot SHOULDNT match. Ie it shouldnt match the above regex combined together, (AND|OR)(=[><=]?|[><]=?|!=|<>), although we dont want to invoke capture buffers so we use (?:) instead of (), because that would be a new token. So we have to make sure char by char that we dont match that pattern. So the inner layer looks like:
(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).
We then wrap that again to say 1 or more of the above..
(?:(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).)+
and then again to capture it
((?:(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).)+)
We put the three parts together and we get
$_ = "OR=5AND=>1536463OR<foORobarOR=5 ";
while (m/(AND|OR) #either AND or OR
(=[><=]?|[><]=?|!=|<>) #one of = => =< == > < >= ....
( #capture all within...
(?: # group for quantifier
(?! # not followed by
(?:AND|OR) # AND or OR
(?:=[><]?|[><]=?|[!=]=)# one of = => =< ...
) # any of the inside
. # match any char..
)+ # 1 or more of the above
) #and return it..
/xgms) { #ignore spaces, repeated,
#multiline, . matches all
# and if it all worked out then...
print "$1 $2 $3\n";
}
# outputs
# OR = 5
# AND => 1536463
# OR < foORobar
# OR = 5
Note the OR in my version of your example. The rgex does not trip up over this because we made the negative lookahead assertion include the => coditional part as well.
Hope this helps
Yves
--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)
Update
LiTinOveWeedle asked for help enhancing this so that the script will match some of the odder relational operators.
my $opers='=[><=!]?|[><!]=|<>|[<>]';
while (m/(AND|OR)($opers)((?:(?!(?:AND|OR)(?:$opers)).)+)/xgms) {
print "$1 $2 $3\n";
}
|