in reply to Re^2: Message regex
in thread Message regex

You should probably give us some more rules, like for example, will the N= tags always be in order?

In the example you give: 8=FIX.4.29=040435=8

I don't see any way to differentiate between 9= and 29= (unless there is a rule that says a value can never end in a '.'). Similarly, the last token could be:

40435=8 or 0435=8 or 435=8 or 35=8 or 5=8

You might want to give us some more rules or find a way to preformat or delimit the input string, unless your tag array is guaranteed to clear up this ambiguity.

No good deed goes unpunished. -- (attributed to) Oscar Wilde

Replies are listed 'Best First'.
Re^4: Message regex
by minixman (Beadle) on Nov 24, 2005 at 17:31 UTC
    Well the bad thing is that there is no order 35= could be at the end of the string and then at the beginning on the next string. I guess i need to think of a way to break it up. At the moment the problem is that the regex treats it as one whole line, so when you try and do a search for something like
    $_ =~ /35=/ it will return the whole string, and not just the 35=INFO lets say.
      I've generally found that before I can solve a problem with code, I need to be able to describe it clearly in English. Please let me try to help you. :)

      You have a given string with tokens embedded within it, the most noticeable feature of the tokens being they come in a key=value form: \d+=.+

      There doesn't seem to be any rule that prevents two tokens from being adjacent to each other, so I have no way to tell when the value of one token ends and where the key of the next token begins, except that I have a finite array of keys to which I could refer. So as I parse the example you gave:

      8=FIX.4.29=040535=849=EXLINK

      If I knew that 29 was a key and 9 was not, then I could surmise that the first key/value pair is:

      8=FIX.4.

      Similarly, if I knew that 35 was a key and 40535, 0535 and 535 and 5 are not keys, then I could surmise that the next token was either 35=8 or 35=84.

      It might help to go backwards through the string, since everything to the right of the last '=' is a value, and then (assuming your list of tags is unique and that no tag contains another tag) you could tokenize your string by looking for a match for a key at the end of the string after iteratively stripping off the last '=+*'.

      That seems a terribly brute-force approach, though; not very elegant.

      No good deed goes unpunished. -- (attributed to) Oscar Wilde