minixman has asked for the wisdom of the Perl Monks concerning the following question:

All i have the following message which i need to break up.
2005/11/18 00:07:18:328: FIXPump: Received data on connection {OBMSCNX +} [8=FIX.4.29=040535=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN323 +0134=4045652=20051 118-05:07:181=ATOP116=0.0000000011=DES:fud632_2005111814=15.000017=013 +1730438520=031=1531.0000000032=1.000037=131726063238=15.000039=254=25 +5=NamerOfContr 0=20051118-05:07:18150=2151=0.0000198=13173047161317260632109=DCN32301 +63=0167=FUT200=200512207=TSE40=244=1531.000000005113=06556=20051117-2 +3:06:4910=021]

Now What i want to be able to do is say match everything between the , and then go over each = and split out the part of the left and the part of the right so something like where you have,
8=FIX, $left would = 8 and $right would = FIX

2005-11-24 Retitled by planetscape, as per Monastery guidelines
Original title: 'Messege regex'

Replies are listed 'Best First'.
Re: Message regex
by anniyan (Monk) on Nov 24, 2005 at 16:07 UTC

    match everything between the , and then go over each =

    In your input there is no , and also the question is not very clear.

    Regards,
    Anniyan
    (CREATED in HELL by DEVIL to s|EVILS|GOODS|g in WORLD)

Re: Message regex
by phaylon (Curate) on Nov 24, 2005 at 16:16 UTC
    I guess pm stripped something from the OPs post.

    To the question, that's a really fuzzy one. Splitting "8=FIX" is no problem, but you should tell us how to get that part. How to break up this for instance?
    8=FIX.4.29=040535=849=EXLINK
    or that one:
    116=0.0000000011=DES:fud632_2005111814=15.000017=013

    Ordinary morality is for ordinary people. -- Aleister Crowley
      Yeah sorry the questions is fuzzy. What i need to do is i have an array of tags ie
      #!/usr/bin/perl use strict; use warnings; my @tags = ("8=","9=","35=","49=","56=","50=","57=", "34=","52=","91=","11=","14=","17=","20=", "31=","32=","37=","38=","39=","54=","55=", "60=","50=","51=","98=","109=","163=","167=", "200=","207=","40=","44=","113=","56=","10=" ); open(FH, "test.log")||die("Unable to open log file: $! \n"); while (<FH>) { if($_ =~ any of the tags in array) { printf $fulltag; } }

      So here we go over each line and match up 8=FIX then put that into a variable something like $first. Does that make sense.

        Should
        31=aaaaaa38=bbbbbbb
        return
        31 => 'aaaaaa', 38 => 'bbbbbbb'
        or
        31 => 'aaaaaa3', 8 => 'bbbbbbb'
        That's why @tags must be populated in advance and must be in order

        use strict; use warnings; my @tags = qw( 8 9 35 49 56 50 57 34 52 81 11 14 17 20 31 32 37 38 39 54 55 60 50 51 98 109 163 167 200 207 40 44 113 56 10 ); my $re = '\\[' . join('', map { "($_)=(.*?)" } @tags) . '\\]'; while (<DATA>) { chomp; my %hash = /$re/; if (not %hash) { warn("Line $. did not match\n"); next; } foreach my $tag (@tags) { printf("%-3s => %s\n", $tag, $hash{$tag}); } print("\n"); } __DATA__ 2005/11/18 00:07:18:328: FIXPump: Received data on connection {OBMSCNX +} [8=FIX.4.29=040535=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN323 +0134=4045652=20051118-05:07:181=ATOP116=0.0000000011=DES:fud632_20051 +11814=15.000017=0131730438520=031=1531.0000000032=1.000037=1317260632 +38=15.000039=254=255=NamerOfContr60=20051118-05:07:18150=2151=0.00001 +98=13173047161317260632109=DCN3230163=0167=FUT200=200512207=TSE40=244 +=1531.000000005113=06556=20051117-23:06:4910=021]

        or with sorted output:

        Output

Re: Message regex
by wazzuteke (Hermit) on Nov 24, 2005 at 16:50 UTC
    I believe I understand what you're asking here and if my answer doesn't help I apologize.

    These are a couple of very simplistic methods you might be able to use, or might be able to spark your imagination with. Again, I'm only about 50% confident I understand what your asking...

    1. You will probably not want to do something like this because of a number of reasons. First '|' regexes are prety expencive, combined with not having any anchors, a possibly infinate size array in @tags(), it could be quite an intensive process. However, if the array is to remain very static and somewhat small it might be a viable solution:
    my @tags = qw( 1= 2= 3= .. ); my $reg_ex = join( '|', @tags ); open FILE, 'dat.log' || die "$!\n"; while ( <FH> ) { print "$_\n" if ( $_ =~ /$reg_ex/s ); }

    I had more examples, but I am quickly realizing I don't fully understand the problem enought to continue. Some questions that come up are:

    1. Does the array @tags() have the potential of growing very large?
    2. Is the contents of the file going to be consistant. For instance, will lines always start with 'tag=...' or will it be invariable?
    3. Is there any more general detail you might be able to expose?

    My answer, again, might help out (and hope it does). Other than that, I'm not sure what else to say...

    ---hA||ta----
    print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );
      First '|' regexes are prety expencive,

      You said that alterations are very expensive, but you can optimize such regexps using modules such as Regexp::List. perl has recently been patched to fix this problem.

      But your program has a bug.
      my $reg_ex = join( '|', @tags );
      will never find tag '38=' if tag '3=' is also present. A common fix is to sort by descending length:
      my $reg_ex = join( '|', sort { length($b) <=> length($a) } @tags );

      The question now that remains is whether
      31=aaaaaa38=bbbbbbb
      should return
      31 => 'aaaaaa', 38 => 'bbbbbbb'
      or
      31 => 'aaaaaa3', 8 => 'bbbbbbb'
      if '38=' and '8=' are both in @tags.

        So using Regexp::List how would i pass in the tags, So if i were looking for something like 35= 109= 55= etc etc
        my $re = $l->list2re(qw/35= 109= 55=/);
        But how does when loop through each part of my message and extract the values.
        2005/11/18 00:06:49:875: FIXPump: Received data on connection {OBMSCNX +} [8=FIX.4.29=040435=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN323 +0134=4045052=20051118-05:06:491=ATOP116=0.0000000011=DES:fud630_20051 +11814=15.000017=0131730433520=031=138.0800000032=15.000037=1317260622 +38=15.000039=254=155=CibntractZ60=20051118-05:06:49150=2151=values=13 +173047101317260622109=DCN3230163=0167=FUT200=200512207=TSE40=244=138. +080000005113=06556=20051117-23:06:4610=230]
      Hmmmm
      so i have found one thing which is strange, when i run the code
      #!/usr/bin/perl use strict; use warnings; my @tags = qw( 8= 9= 35= ); my $reg_ex = join( '|', @tags ); open(FH, "test.log")||die("Unable to open log file: $! \n"); while(<FH>) { print "$_\n" if ( $_ =~ /$reg_ex/s ); }
      and test.log being the above, i found that the that entry is one line, so the output i get it
      $ perl t.pl 2005/11/18 00:06:49:875: FIXPump: Received data on connection {OBMSCNX +} [8=FIX.4.29=040435=849=EXLINK256=DB_ORDER50=DESRISKGATEWAY57=DCN323 +0134=4045052=20051118-05:06:491=ATOP116=0.0000000011=DES:fud630_20051 +11814=15.000017=0131730433520=031=138.0800000032=15.000037=1317260622 +38=15.000039=254=155=ContractNum60=20051118-05:06:49150=2151=0.000019 +8=13173047101317260622109=DCN3230163=0167=FUT200=200512207=TSE40=244= +138.080000005113=06556=20051117-23:06:4610=230]
      What i really want to do is be able to pull out each of the tage, so break up the line and say look for 35= and give me whatever is after that, and then carry on and say find 109= and and give me everything after that. The problem i see is that there is no break between them.
      so 8=FIX.4.29=040435=8 should read
      8=FIX.4.2
      9=0404
      35=8

        This seems to be a very strange file in the first place, which will make it quite hard to parce successfully every time and every iteration. ptum definately has it right in that we need a little more explenation of the rules of the data that will be in the file.

        Given:
        8=FIX.4.29=040435=8
        How do we know that 9=0404 and not 9=04043? For instance, if the pattern is /\(d{1,2}=)/ always keep $1 and assume that everything else must be associated with the previous pattern?

        It's hard to be certain without either better rules, or a better data source ;) Try giving as much information about the file as you can think of (and are allowed to, perhaps) and I am sure someone here will be able to help out further.

        ---hA||ta----
        print map{$_.' '}grep{/\w+/}@{[reverse(qw{Perl Code})]} or die while ( 'trying' );
        You should probably give us some more rules, like for example, will the N= tags always be in order?

        In the example you give: 8=FIX.4.29=040435=8

        I don't see any way to differentiate between 9= and 29= (unless there is a rule that says a value can never end in a '.'). Similarly, the last token could be:

        40435=8 or 0435=8 or 435=8 or 35=8 or 5=8

        You might want to give us some more rules or find a way to preformat or delimit the input string, unless your tag array is guaranteed to clear up this ambiguity.

        No good deed goes unpunished. -- (attributed to) Oscar Wilde