LiTinOveWeedle has asked for the wisdom of the Perl Monks concerning the following question:

Hi, At first I want to thanks all monks who took care about broken Temple yesterday. We discuss about this my question in Chatterbox, but due bug in interface I wasn't able to add question to SOPW. So I am doing it now.

Question is simple. I have something like:

((AND|OR)([!=><]+)(.*))+

(I used regex just to describe how my string can look like) I fact this can be something like:

$check = "AND=>1536463OR<foobarOR=5";

I need this to parse into $1, $2, $3, $4...... so:

while ( $check =~ /^((AND|OR)([!=><]+)((?:(?!AND)(?!OR ).)*))/ ) { $1...; $2...; $3...; ...; $check =~ s/$1//; }

(this was made with help from other monks...) but unfortunately don't work well in all cases. What I need is parse string into logic operator AND or OR, =<>!(should be its combination) and last - variable. If you have any suppose how to do this, please get me know. At least syntax of the string can be changed.....

Thanks for help

Li Tin O've Weedle
mad Tsort's philosopher

Edit: chipmunk 2001-09-03

Replies are listed 'Best First'.
Re: Parsing with regex
by tachyon (Chancellor) on Sep 04, 2001 at 02:29 UTC

    You will need to post some sample data. When you say it does not work in all cases you need to show those cases so people can explain 1) why and 2) how to fix it to work in those cases.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Parsing with regex
by LiTinOveWeedle (Scribe) on Sep 04, 2001 at 01:51 UTC
    Just code tag missing in example should be:

    $check = "AND=>1536463OR<foobarOR=5";

    Sorry for this mistake.

    Li Tin O've Weedle
    mad Tsort's philosopher

      Well, personally what youve posted seems to raise more questions than answers.
      I have a few thoughts though, which i guess ill start with the regex that you used to describe your data, and the sample data you provided.
      Regex is like ((AND|OR)([!=><]+)(.*))+ Input is like $check = "AND=>1536463OR<foobarOR=5";
      My first question comes from looking at the two together. Your regex describes some of the following strings:
      AND!==!<!>>!!LKJKJIOJJ182873KLJJyuukjljkOR!<><><><><=!Blah OR==!=!Hmm, could this be right?AND>>>>>>>this could be a problem
      I think my point is taken. :-) So then we look at the data. You didnt really say what was supposed to happen. IS this supposed to produce the following triplets
      AND,=>,1536463 OR,<,foobar OR,=,5
      Or was it supposed to reject it? (Its not clear from the conversation I saw on the chatterbox, nor from your post)
      So going back to my first point I assume that you nead to handle the basic relational operators? ie  = => =< == > < >= <= != <> Off the top of my head that becomes  (=[><=]?|[><]=?|!=|<>) So then we already have the first part, (AND|OR), which leaves the last. Now this comes my second intrepretation of your question. How do I keep .* from eating more than it should?

      The way to solve this is figure out what the dot SHOULDNT match. Ie it shouldnt match the above regex combined together,  (AND|OR)(=[><=]?|[><]=?|!=|<>), although we dont want to invoke capture buffers so we use (?:) instead of (), because that would be a new token. So we have to make sure char by char that we dont match that pattern. So the inner layer looks like: (?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)). We then wrap that again to say 1 or more of the above..
      (?:(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).)+ and then again to capture it ((?:(?!(?:AND|OR)(?:=[><=]?|[><]=?|!=|<>)).)+) We put the three parts together and we get

      $_ = "OR=5AND=>1536463OR<foORobarOR=5 "; while (m/(AND|OR) #either AND or OR (=[><=]?|[><]=?|!=|<>) #one of = => =< == > < >= .... ( #capture all within... (?: # group for quantifier (?! # not followed by (?:AND|OR) # AND or OR (?:=[><]?|[><]=?|[!=]=)# one of = => =< ... ) # any of the inside . # match any char.. )+ # 1 or more of the above ) #and return it.. /xgms) { #ignore spaces, repeated, #multiline, . matches all # and if it all worked out then... print "$1 $2 $3\n"; } # outputs # OR = 5 # AND => 1536463 # OR < foORobar # OR = 5
      Note the OR in my version of your example. The rgex does not trip up over this because we made the negative lookahead assertion include the => coditional part as well.

      Hope this helps

      Yves

      --
      You are not ready to use symrefs unless you already know why they are bad. -- tadmc (CLPM)

      Update
      LiTinOveWeedle asked for help enhancing this so that the script will match some of the odder relational operators.

      my $opers='=[><=!]?|[><!]=|<>|[<>]'; while (m/(AND|OR)($opers)((?:(?!(?:AND|OR)(?:$opers)).)+)/xgms) { print "$1 $2 $3\n"; }
      Thanks demerphq,
      I am sorry for my poor problem describing. But you are totally right. Your work is great. This is what I really want. So no other description of my problems needed now.
      Thanks

      Li Tin O've Weedle
      mad Tsort's philosopher