Re^2: Message regex

Hmmmm
so i have found one thing which is strange, when i run the code

#!/usr/bin/perl

use strict;
use warnings;
my @tags = qw( 8= 9= 35= );
my $reg_ex = join( '|', @tags );
open(FH, "test.log")||die("Unable to open log file: $! \n");
while(<FH>) {
  print "$_\n" if ( $_ =~ /$reg_ex/s );
}
[download]

and test.log being the above, i found that the that entry is one line, so the output i get it

$ perl t.pl
2005/11/18 00:06:49:875: FIXPump: Received data on connection {OBMSCNX
+} [8=FIX.4.29=040435=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN323
+0134=4045052=20051118-05:06:491=ATOP116=0.0000000011=DES:fud630_20051
+11814=15.000017=0131730433520=031=138.0800000032=15.000037=1317260622
+38=15.000039=254=155=ContractNum60=20051118-05:06:49150=2151=0.000019
+8=13173047101317260622109=DCN3230163=0167=FUT200=200512207=TSE40=244=
+138.080000005113=06556=20051117-23:06:4610=230]
[download]

What i really want to do is be able to pull out each of the tage, so break up the line and say look for 35= and give me whatever is after that, and then carry on and say find 109= and and give me everything after that. The problem i see is that there is no break between them.
so 8=FIX.4.29=040435=8 should read
8=FIX.4.2
9=0404
35=8

Comment on Re^2: Message regex Select or Download Code

Replies are listed 'Best First'.
Re^3: Message regex by wazzuteke (Hermit) on Nov 24, 2005 at 17:36 UTC
This seems to be a very strange file in the first place, which will make it quite hard to parce successfully every time and every iteration. ptum definately has it right in that we need a little more explenation of the rules of the data that will be in the file. Given: `8=FIX.4.29=040435=8` How do we know that `9=0404` and not `9=04043`? For instance, if the pattern is `/\(d{1,2}=)/` always keep `$1` and assume that everything else must be associated with the previous pattern? It's hard to be certain without either better rules, or a better data source ;) Try giving as much information about the file as you can think of (and are allowed to, perhaps) and I am sure someone here will be able to help out further. ---hA\|\|ta---- `print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );`	[reply] [d/l] [select]
Re^3: Message regex by ptum (Priest) on Nov 24, 2005 at 17:23 UTC
You should probably give us some more rules, like for example, will the N= tags always be in order? In the example you give: 8=FIX.4.29=040435=8 I don't see any way to differentiate between 9= and 29= (unless there is a rule that says a value can never end in a '.'). Similarly, the last token could be: 40435=8 or 0435=8 or 435=8 or 35=8 or 5=8 You might want to give us some more rules or find a way to preformat or delimit the input string, unless your tag array is guaranteed to clear up this ambiguity. No good deed goes unpunished. -- (attributed to) Oscar Wilde	[reply]
Re^4: Message regex by minixman (Beadle) on Nov 24, 2005 at 17:31 UTC
Well the bad thing is that there is no order 35= could be at the end of the string and then at the beginning on the next string. I guess i need to think of a way to break it up. At the moment the problem is that the regex treats it as one whole line, so when you try and do a search for something like $_ =~ /35=/ it will return the whole string, and not just the 35=INFO lets say.	[reply]
Re^5: Message regex by ptum (Priest) on Nov 24, 2005 at 17:50 UTC
I've generally found that before I can solve a problem with code, I need to be able to describe it clearly in English. Please let me try to help you. :) You have a given string with tokens embedded within it, the most noticeable feature of the tokens being they come in a key=value form: \d+=.+ There doesn't seem to be any rule that prevents two tokens from being adjacent to each other, so I have no way to tell when the value of one token ends and where the key of the next token begins, except that I have a finite array of keys to which I could refer. So as I parse the example you gave: 8=FIX.4.29=040535=849=EXLINK If I knew that 29 was a key and 9 was not, then I could surmise that the first key/value pair is: 8=FIX.4. Similarly, if I knew that 35 was a key and 40535, 0535 and 535 and 5 are not keys, then I could surmise that the next token was either 35=8 or 35=84. It might help to go backwards through the string, since everything to the right of the last '=' is a value, and then (assuming your list of tags is unique and that no tag contains another tag) you could tokenize your string by looking for a match for a key at the end of the string after iteratively stripping off the last '=+'. That seems a terribly brute-force approach, though; not very elegant. No good deed goes unpunished. -- (attributed to) Oscar Wilde*	[reply]