Novice_1 has asked for the wisdom of the Perl Monks concerning the following question:

i have a text file as shown here 0011222STRUCTdata...............02121STRUCTdata..........02342STRUCTdata...... Now i need to write a pattern match such that it matches all the "STRUCT" ,other than the first one. The first one will always be preceeded by numbers and has line begning (^) in front , the rest of the "STRUCT" will have numbers in front and no line begning Help me out with a pattern match

Replies are listed 'Best First'.
Re: Pattern Matching request
by si_lence (Deacon) on Dec 15, 2004 at 08:39 UTC
    I'm not too sure what you want to do. But if you want all matches except the
    first I would get all the matches and just get rid of the first one.
    If you want to really match the word "STRUCT" then use something like this:
    use strict; use warnings; my $dat="0011222STRUCTdata1...............02121STRUCTdata2..........02 +342STRUCTdata3"; my @m; @m = $dat =~ /STRUCT/g; shift (@m); foreach (@m) {print "$_\n"};
    If you want to match the data between the "STRUCT" then try one of these:
    (excluding the data1 part. If you need it just delete one of the shift)
    @m = $dat =~ /(.*?)(?:STRUCT|$)/g; shift (@m); shift (@m); foreach (@m) {print "$_\n"}; @m = split(/STRUCT/, $dat); shift(@m); shift(@m); foreach (@m) {print "$_\n"};
    si_lence
Re: Pattern Matching request
by Random_Walk (Prior) on Dec 15, 2004 at 10:06 UTC

    if STRUCT marks the start of each record and occurs nowhere else you would be better splitting on STRUCT and just ignoring or shifting off the first element. Split is mostly (always ?) faster than regex.

    Cheers,
    R.

Re: Pattern Matching request
by zejames (Hermit) on Dec 15, 2004 at 08:41 UTC

    There are somes things that aren't clear : does the number belong to data, or is it another information ?

    Here, I've considered that numbers before the STRUCT keyword and data are not the same. Plus I've taken into account the fact that data may contains numbers. What make these numbers different is that they do not precede a STRUCT keyword.

    To sum up the proposed solution to your problem : match everything and then remove what you are not interested in.

    # Text wrapped to fit in the screen my $text = "0011222STRUCTdata..........6ab.." . "02121STRUCTdata.........." . "021232STRUCTdata......" . "02342STRUCTdata......"; my @matches = $text =~ m/(\d+) # First match some numbers STRUCT # Then the STRUCT keyword (.+?) # Then the data, that have to # be followed by (?= # (look-ahead assertion) (?:\d+STRUCT # - numbers and 'STRUCT' | # or $ # - end of line ) )/xg; # Then remove the first two matches that is first numbers and # first data @matches = splice @matches, 2; { local $, = ","; print @matches, "\n"; }

    The look ahead trick helps to easily manage numbers in the data text.


    --
    zejames
Re: Pattern Matching request
by ysth (Canon) on Dec 15, 2004 at 08:54 UTC
    It would be helpful if you said what you want to do for each match. I'd do a scalar context //g match first, loop through the to get the rest (untested):
    $data =~ /^\d+STRUCT/g or "warn: gang aft agley"; while ($data =~ /\d+STRUCT/g) { # process match }