Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm just learning perl and regular expressions. Hopefully someone can help me out. I have a multi-line string that is semi-colon terminated. For example:
$_ = "10001 LONG RECORD\n BEGIN RECORD A, CODE B, TABLE C END \n NEXT\n STANDARD DATA 1 BEGIN CODE A, RECORD B END NEXT\n SHORT RECORD BEGIN TABLE B END NEXT\n STANDARD DATA 2 BEGIN CODE C, RECORD D, FILE END;";
I need to extract the data between each /BEGIN/ and /END/ from the string. The amount of sub-data records are variable but always given by a NEXT.

I'm thinking of writing a loop to match each group of expressions. But how do I count the amount of NEXT's in the string? Also how do I get the next set of /BEGIN/ /END/ expressions?

Final Regex question: I have the following in $1 = "CODE C, RECORD D, FILE";

How do I separate the 1st item from the the second, between comas. I.E. I want to have "CODE RECORD FILE" in one Array and "C D" in another. I thought the following would work:

(@recordSource, @recordType) = split(/\s/, split(/,/, $1));

But I only get a number in @recordSource.

Any tips or suggestions would be appreciated! Maybe I need a different approach?

Replies are listed 'Best First'.
Re: Multiple RegEx Matches in a single string
by broquaint (Abbot) on May 14, 2003 at 15:09 UTC
    Assuming your BEGIN and END aren't nested
    use strict; use Data::Dumper; $_ = "10001 LONG RECORD\n BEGIN RECORD A, CODE B, TABLE C END \n NEXT\n STANDARD DATA 1 BEGIN CODE A, RECORD B END NEXT\n SHORT RECORD BEGIN TABLE B END NEXT\n STANDARD DATA 2 BEGIN CODE C, RECORD D, FILE END;"; my(@records) = m< \bBEGIN\b \s+ (.*?) \s+ \bEND\b >gx; print Dumper(\@records); __output__ $VAR1 = [ 'RECORD A, CODE B, TABLE C', 'CODE A, RECORD B', 'TABLE B', 'CODE C, RECORD D, FILE' ];
    As for your final question - firstly @recordSource was slurping in all the arguments returned from the split and secondly split won't work how you expect it to (unfortunately) so consort the docs. Here's what you want (although I'm using an array of arrays of arrays as that seems to map to your data better)
    HTH

    _________
    broquaint

      Nice presentation. But, depending on the details of the language being parsed, you may need to add an "s" modifier to that regex ... otherwise, BEGIN/END won't be recognized if it spans a newline.

          -- Chip Salzenberg, Free-Floating Agent of Chaos

Re: Multiple RegEx Matches in a single string
by Fletch (Bishop) on May 14, 2003 at 15:27 UTC

    Insert quote about hammers and everything looking like nails here

    This is on the border of where you need to stop trying to use just regexen and write a proper parser (either by hand rolling your own or using something like Parse::RecDescent).