in reply to Re: Message regex
in thread Message regex

Hmmmm
so i have found one thing which is strange, when i run the code
#!/usr/bin/perl use strict; use warnings; my @tags = qw( 8= 9= 35= ); my $reg_ex = join( '|', @tags ); open(FH, "test.log")||die("Unable to open log file: $! \n"); while(<FH>) { print "$_\n" if ( $_ =~ /$reg_ex/s ); }
and test.log being the above, i found that the that entry is one line, so the output i get it
$ perl t.pl 2005/11/18 00:06:49:875: FIXPump: Received data on connection {OBMSCNX +} [8=FIX.4.29=040435=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN323 +0134=4045052=20051118-05:06:491=ATOP116=0.0000000011=DES:fud630_20051 +11814=15.000017=0131730433520=031=138.0800000032=15.000037=1317260622 +38=15.000039=254=155=ContractNum60=20051118-05:06:49150=2151=0.000019 +8=13173047101317260622109=DCN3230163=0167=FUT200=200512207=TSE40=244= +138.080000005113=06556=20051117-23:06:4610=230]
What i really want to do is be able to pull out each of the tage, so break up the line and say look for 35= and give me whatever is after that, and then carry on and say find 109= and and give me everything after that. The problem i see is that there is no break between them.
so 8=FIX.4.29=040435=8 should read
8=FIX.4.2
9=0404
35=8

Replies are listed 'Best First'.
Re^3: Message regex
by wazzuteke (Hermit) on Nov 24, 2005 at 17:36 UTC
    This seems to be a very strange file in the first place, which will make it quite hard to parce successfully every time and every iteration. ptum definately has it right in that we need a little more explenation of the rules of the data that will be in the file.

    Given:
    8=FIX.4.29=040435=8
    How do we know that 9=0404 and not 9=04043? For instance, if the pattern is /\(d{1,2}=)/ always keep $1 and assume that everything else must be associated with the previous pattern?

    It's hard to be certain without either better rules, or a better data source ;) Try giving as much information about the file as you can think of (and are allowed to, perhaps) and I am sure someone here will be able to help out further.

    ---hA||ta----
    print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );
Re^3: Message regex
by ptum (Priest) on Nov 24, 2005 at 17:23 UTC
    You should probably give us some more rules, like for example, will the N= tags always be in order?

    In the example you give: 8=FIX.4.29=040435=8

    I don't see any way to differentiate between 9= and 29= (unless there is a rule that says a value can never end in a '.'). Similarly, the last token could be:

    40435=8 or 0435=8 or 435=8 or 35=8 or 5=8

    You might want to give us some more rules or find a way to preformat or delimit the input string, unless your tag array is guaranteed to clear up this ambiguity.

    No good deed goes unpunished. -- (attributed to) Oscar Wilde
      Well the bad thing is that there is no order 35= could be at the end of the string and then at the beginning on the next string. I guess i need to think of a way to break it up. At the moment the problem is that the regex treats it as one whole line, so when you try and do a search for something like
      $_ =~ /35=/ it will return the whole string, and not just the 35=INFO lets say.
        I've generally found that before I can solve a problem with code, I need to be able to describe it clearly in English. Please let me try to help you. :)

        You have a given string with tokens embedded within it, the most noticeable feature of the tokens being they come in a key=value form: \d+=.+

        There doesn't seem to be any rule that prevents two tokens from being adjacent to each other, so I have no way to tell when the value of one token ends and where the key of the next token begins, except that I have a finite array of keys to which I could refer. So as I parse the example you gave:

        8=FIX.4.29=040535=849=EXLINK

        If I knew that 29 was a key and 9 was not, then I could surmise that the first key/value pair is:

        8=FIX.4.

        Similarly, if I knew that 35 was a key and 40535, 0535 and 535 and 5 are not keys, then I could surmise that the next token was either 35=8 or 35=84.

        It might help to go backwards through the string, since everything to the right of the last '=' is a value, and then (assuming your list of tags is unique and that no tag contains another tag) you could tokenize your string by looking for a match for a key at the end of the string after iteratively stripping off the last '=+*'.

        That seems a terribly brute-force approach, though; not very elegant.

        No good deed goes unpunished. -- (attributed to) Oscar Wilde