in reply to Parsing with regex

Just code tag missing in example should be:

$check = "AND=>1536463OR<foobarOR=5";

Sorry for this mistake.

Li Tin O've Weedle
mad Tsort's philosopher

Replies are listed 'Best First'.
Re: Re: Parsing with regex
by demerphq (Chancellor) on Sep 04, 2001 at 02:57 UTC
    Well, personally what youve posted seems to raise more questions than answers.
    I have a few thoughts though, which i guess ill start with the regex that you used to describe your data, and the sample data you provided.
    Regex is like ((AND|OR)([!=><]+)(.*))+ Input is like $check = "AND=>1536463OR<foobarOR=5";
    My first question comes from looking at the two together. Your regex describes some of the following strings:
    AND!==!<!>>!!LKJKJIOJJ182873KLJJyuukjljkOR!<><><><><=!Blah OR==!=!Hmm, could this be right?AND>>>>>>>this could be a problem
    I think my point is taken. :-) So then we look at the data. You didnt really say what was supposed to happen. IS this supposed to produce the following triplets
    AND,=>,1536463 OR,<,foobar OR,=,5
    Or was it supposed to reject it? (Its not clear from the conversation I saw on the chatterbox, nor from your post)
    So going back to my first point I assume that you nead to handle the basic relational operators? ie  = => =< == > < >= <= != <> Off the top of my head that becomes  (=[><=]?|[><]=?|!=|<>) So then we already have the first part, (AND|OR), which leaves the last. Now this comes my second intrepretation of your question. How do I keep .* from eating more than it should?

    The way to solve this is figure out what the dot SHOULDNT match. Ie it shouldnt match the above regex combined together,  (AND|OR)(=[><=]?|[><]=?|!=|<>), although we dont want to invoke capture buffers so we use (?:) instead of (), because that would be a new token. So we have to make sure char by char that we dont match that pattern. So the inner layer looks like: (?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)). We then wrap that again to say 1 or more of the above..
    (?:(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).)+ and then again to capture it ((?:(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).)+) We put the three parts together and we get

    $_ = "OR=5AND=>1536463OR<foORobarOR=5 "; while (m/(AND|OR) #either AND or OR (=[><=]?|[><]=?|!=|<>) #one of = => =< == > < >= .... ( #capture all within... (?: # group for quantifier (?! # not followed by (?:AND|OR) # AND or OR (?:=[><]?|[><]=?|[!=]=)# one of = => =< ... ) # any of the inside . # match any char.. )+ # 1 or more of the above ) #and return it.. /xgms) { #ignore spaces, repeated, #multiline, . matches all # and if it all worked out then... print "$1 $2 $3\n"; } # outputs # OR = 5 # AND => 1536463 # OR < foORobar # OR = 5
    Note the OR in my version of your example. The rgex does not trip up over this because we made the negative lookahead assertion include the => coditional part as well.

    Hope this helps

    Yves

    --
    You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

    Update
    LiTinOveWeedle asked for help enhancing this so that the script will match some of the odder relational operators.

    my $opers='=[><=!]?|[><!]=|<>|[<>]'; while (m/(AND|OR)($opers)((?:(?!(?:AND|OR)(?:$opers)).)+)/xgms) { print "$1 $2 $3\n"; }
Re: Re: Parsing with regex
by LiTinOveWeedle (Scribe) on Sep 04, 2001 at 04:38 UTC
    Thanks demerphq,
    I am sorry for my poor problem describing. But you are totally right. Your work is great. This is what I really want. So no other description of my problems needed now.
    Thanks

    Li Tin O've Weedle
    mad Tsort's philosopher