in reply to Re^3: Message regex
in thread Message regex

Well the bad thing is that there is no order 35= could be at the end of the string and then at the beginning on the next string. I guess i need to think of a way to break it up. At the moment the problem is that the regex treats it as one whole line, so when you try and do a search for something like
$_ =~ /35=/ it will return the whole string, and not just the 35=INFO lets say.

Replies are listed 'Best First'.
Re^5: Message regex
by ptum (Priest) on Nov 24, 2005 at 17:50 UTC
    I've generally found that before I can solve a problem with code, I need to be able to describe it clearly in English. Please let me try to help you. :)

    You have a given string with tokens embedded within it, the most noticeable feature of the tokens being they come in a key=value form: \d+=.+

    There doesn't seem to be any rule that prevents two tokens from being adjacent to each other, so I have no way to tell when the value of one token ends and where the key of the next token begins, except that I have a finite array of keys to which I could refer. So as I parse the example you gave:

    8=FIX.4.29=040535=849=EXLINK

    If I knew that 29 was a key and 9 was not, then I could surmise that the first key/value pair is:

    8=FIX.4.

    Similarly, if I knew that 35 was a key and 40535, 0535 and 535 and 5 are not keys, then I could surmise that the next token was either 35=8 or 35=84.

    It might help to go backwards through the string, since everything to the right of the last '=' is a value, and then (assuming your list of tags is unique and that no tag contains another tag) you could tokenize your string by looking for a match for a key at the end of the string after iteratively stripping off the last '=+*'.

    That seems a terribly brute-force approach, though; not very elegant.

    No good deed goes unpunished. -- (attributed to) Oscar Wilde