in reply to Re: Parsing with regex
in thread Parsing with regex

Well, personally what youve posted seems to raise more questions than answers.
I have a few thoughts though, which i guess ill start with the regex that you used to describe your data, and the sample data you provided.
Regex is like ((AND|OR)([!=><]+)(.*))+ Input is like $check = "AND=>1536463OR<foobarOR=5";
My first question comes from looking at the two together. Your regex describes some of the following strings:
AND!==!<!>>!!LKJKJIOJJ182873KLJJyuukjljkOR!<><><><><=!Blah OR==!=!Hmm, could this be right?AND>>>>>>>this could be a problem
I think my point is taken. :-) So then we look at the data. You didnt really say what was supposed to happen. IS this supposed to produce the following triplets
AND,=>,1536463 OR,<,foobar OR,=,5
Or was it supposed to reject it? (Its not clear from the conversation I saw on the chatterbox, nor from your post)
So going back to my first point I assume that you nead to handle the basic relational operators? ie  = => =< == > < >= <= != <> Off the top of my head that becomes  (=[><=]?|[><]=?|!=|<>) So then we already have the first part, (AND|OR), which leaves the last. Now this comes my second intrepretation of your question. How do I keep .* from eating more than it should?

The way to solve this is figure out what the dot SHOULDNT match. Ie it shouldnt match the above regex combined together,  (AND|OR)(=[><=]?|[><]=?|!=|<>), although we dont want to invoke capture buffers so we use (?:) instead of (), because that would be a new token. So we have to make sure char by char that we dont match that pattern. So the inner layer looks like: (?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)). We then wrap that again to say 1 or more of the above..
(?:(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).)+ and then again to capture it ((?:(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).)+) We put the three parts together and we get

$_ = "OR=5AND=>1536463OR<foORobarOR=5 "; while (m/(AND|OR) #either AND or OR (=[><=]?|[><]=?|!=|<>) #one of = => =< == > < >= .... ( #capture all within... (?: # group for quantifier (?! # not followed by (?:AND|OR) # AND or OR (?:=[><]?|[><]=?|[!=]=)# one of = => =< ... ) # any of the inside . # match any char.. )+ # 1 or more of the above ) #and return it.. /xgms) { #ignore spaces, repeated, #multiline, . matches all # and if it all worked out then... print "$1 $2 $3\n"; } # outputs # OR = 5 # AND => 1536463 # OR < foORobar # OR = 5
Note the OR in my version of your example. The rgex does not trip up over this because we made the negative lookahead assertion include the => coditional part as well.

Hope this helps

Yves

--
You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

Update
LiTinOveWeedle asked for help enhancing this so that the script will match some of the odder relational operators.

my $opers='=[><=!]?|[><!]=|<>|[<>]'; while (m/(AND|OR)($opers)((?:(?!(?:AND|OR)(?:$opers)).)+)/xgms) { print "$1 $2 $3\n"; }