in reply to Message regex

I believe I understand what you're asking here and if my answer doesn't help I apologize.

These are a couple of very simplistic methods you might be able to use, or might be able to spark your imagination with. Again, I'm only about 50% confident I understand what your asking...

1. You will probably not want to do something like this because of a number of reasons. First '|' regexes are prety expencive, combined with not having any anchors, a possibly infinate size array in @tags(), it could be quite an intensive process. However, if the array is to remain very static and somewhat small it might be a viable solution:
my @tags = qw( 1= 2= 3= .. ); my $reg_ex = join( '|', @tags ); open FILE, 'dat.log' || die "$!\n"; while ( <FH> ) { print "$_\n" if ( $_ =~ /$reg_ex/s ); }

I had more examples, but I am quickly realizing I don't fully understand the problem enought to continue. Some questions that come up are:

1. Does the array @tags() have the potential of growing very large?
2. Is the contents of the file going to be consistant. For instance, will lines always start with 'tag=...' or will it be invariable?
3. Is there any more general detail you might be able to expose?

My answer, again, might help out (and hope it does). Other than that, I'm not sure what else to say...

---hA||ta----
print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );

Replies are listed 'Best First'.
Re^2: Message regex
by ikegami (Patriarch) on Nov 24, 2005 at 17:44 UTC
    First '|' regexes are prety expencive,

    You said that alterations are very expensive, but you can optimize such regexps using modules such as Regexp::List. perl has recently been patched to fix this problem.

    But your program has a bug.
    my $reg_ex = join( '|', @tags );
    will never find tag '38=' if tag '3=' is also present. A common fix is to sort by descending length:
    my $reg_ex = join( '|', sort { length($b) <=> length($a) } @tags );

    The question now that remains is whether
    31=aaaaaa38=bbbbbbb
    should return
    31 => 'aaaaaa', 38 => 'bbbbbbb'
    or
    31 => 'aaaaaa3', 8 => 'bbbbbbb'
    if '38=' and '8=' are both in @tags.

      So using Regexp::List how would i pass in the tags, So if i were looking for something like 35= 109= 55= etc etc
      my $re = $l->list2re(qw/35= 109= 55=/);
      But how does when loop through each part of my message and extract the values.
      2005/11/18 00:06:49:875: FIXPump: Received data on connection {OBMSCNX +} [8=FIX.4.29=040435=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN323 +0134=4045052=20051118-05:06:491=ATOP116=0.0000000011=DES:fud630_20051 +11814=15.000017=0131730433520=031=138.0800000032=15.000037=1317260622 +38=15.000039=254=155=CibntractZ60=20051118-05:06:49150=2151=values=13 +173047101317260622109=DCN3230163=0167=FUT200=200512207=TSE40=244=138. +080000005113=06556=20051117-23:06:4610=230]
        So i think i might have come up with some sort of solution, but needs some expert tuning.
        Take a look at the following code.
        !/usr/bin/perl use strict; use warnings; my @tags = qw( 8 9 35 50 97 57 34 49 56 43 52 200 207 40 55 11 167 54 +59 44 21 38 60 1 10); my $reg_ex = join( '|', @tags ); open(FH, "test.log")||die("Unable to open log file: $! \n"); while(<FH>) { printf("Working on String : $_ \n"); if( /\[(.*)\]/) { printf("New String: <$1> \n"); my @vals = split(/(8=|9=)/, $1); foreach my $l (@vals) { printf("$l\n"); } } }
        This spits out.
        chimi:~/programs/perlweb/FIXRead cuthbe$ perl t.pl Working on String : 2005/11/18 00:06:49:875: FIXPump: Received data on + connection {OBMSCNX} [8=FIX.4.29=040435=849=EXLINK256=DB_ORDER50=DES +RISKGATEWAY57=DCN3230134=4045052=20051118-05:06:491=ATOP116=0.0000000 +011=DES:fud630_2005111814=15.000017=0131730433520=031=138.0800000032= +15.000037=131726062238=15.000039=254=155=CibntractZ60=20051118-05:06: +49150=2151=values=13173047101317260622109=DCN3230163=0167=F00=2005122 +07=T40=244=138.080000005113=06556=20051117-23:06:4610=230] New String: <8=FIX.4.29=040435=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY +57=DCN3230134=4045052=20051118-05:06:491=ATOP116=0.0000000011=DES:fud +630_2005111814=15.000017=0131730433520=031=138.0800000032=15.000037=1 +31726062238=15.000039=254=155=CibntractZ60=2051118-05:06:49150=2151=v +alues=13173047101317260622109=DCN3230163=0167=F200=200512207=T40=244= +138.080000005113=06556=20051117-23:06:4610=230> 8= FIX.4.2 9= 040435=84 9= EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN3230134=4045052=20051118-05:0 +6:491=ATO16=0.0000000011=DES:fud630_2005111814=15.000017=013173043352 +0=031=1300000032=15.000037=13172606223 8= 15.00003 9= 254=155=CibntractZ60=20051118-05:06:49150=2151=values=1317304710131726 +062210 9= DCN3230163=0167=F00=200512207=TSE40=244=138.080000005113=06556=2005111 +7-23:06:4610=230
        So i can see that if i put in the values into the split then it will pass them out,and then i should be able to grab them so first value would be the tag and then second would be the info.
        Question is how do i get the split to do each tag that i need to search and more if need be.
Re^2: Message regex
by minixman (Beadle) on Nov 24, 2005 at 17:07 UTC
    Hmmmm
    so i have found one thing which is strange, when i run the code
    #!/usr/bin/perl use strict; use warnings; my @tags = qw( 8= 9= 35= ); my $reg_ex = join( '|', @tags ); open(FH, "test.log")||die("Unable to open log file: $! \n"); while(<FH>) { print "$_\n" if ( $_ =~ /$reg_ex/s ); }
    and test.log being the above, i found that the that entry is one line, so the output i get it
    $ perl t.pl 2005/11/18 00:06:49:875: FIXPump: Received data on connection {OBMSCNX +} [8=FIX.4.29=040435=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN323 +0134=4045052=20051118-05:06:491=ATOP116=0.0000000011=DES:fud630_20051 +11814=15.000017=0131730433520=031=138.0800000032=15.000037=1317260622 +38=15.000039=254=155=ContractNum60=20051118-05:06:49150=2151=0.000019 +8=13173047101317260622109=DCN3230163=0167=FUT200=200512207=TSE40=244= +138.080000005113=06556=20051117-23:06:4610=230]
    What i really want to do is be able to pull out each of the tage, so break up the line and say look for 35= and give me whatever is after that, and then carry on and say find 109= and and give me everything after that. The problem i see is that there is no break between them.
    so 8=FIX.4.29=040435=8 should read
    8=FIX.4.2
    9=0404
    35=8

      This seems to be a very strange file in the first place, which will make it quite hard to parce successfully every time and every iteration. ptum definately has it right in that we need a little more explenation of the rules of the data that will be in the file.

      Given:
      8=FIX.4.29=040435=8
      How do we know that 9=0404 and not 9=04043? For instance, if the pattern is /\(d{1,2}=)/ always keep $1 and assume that everything else must be associated with the previous pattern?

      It's hard to be certain without either better rules, or a better data source ;) Try giving as much information about the file as you can think of (and are allowed to, perhaps) and I am sure someone here will be able to help out further.

      ---hA||ta----
      print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );
      You should probably give us some more rules, like for example, will the N= tags always be in order?

      In the example you give: 8=FIX.4.29=040435=8

      I don't see any way to differentiate between 9= and 29= (unless there is a rule that says a value can never end in a '.'). Similarly, the last token could be:

      40435=8 or 0435=8 or 435=8 or 35=8 or 5=8

      You might want to give us some more rules or find a way to preformat or delimit the input string, unless your tag array is guaranteed to clear up this ambiguity.

      No good deed goes unpunished. -- (attributed to) Oscar Wilde
        Well the bad thing is that there is no order 35= could be at the end of the string and then at the beginning on the next string. I guess i need to think of a way to break it up. At the moment the problem is that the regex treats it as one whole line, so when you try and do a search for something like
        $_ =~ /35=/ it will return the whole string, and not just the 35=INFO lets say.