Multiple RegEx Matches in a single string

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm just learning perl and regular expressions. Hopefully someone can help me out. I have a multi-line string that is semi-colon terminated. For example:

$_ = "10001 LONG RECORD\n
BEGIN RECORD A, CODE B, TABLE C END \n
NEXT\n
STANDARD DATA 1 BEGIN CODE A, RECORD B END NEXT\n
SHORT RECORD BEGIN TABLE B END NEXT\n
STANDARD DATA 2 BEGIN CODE C, RECORD D, FILE END;";
[download]

I need to extract the data between each /BEGIN/ and /END/ from the string. The amount of sub-data records are variable but always given by a NEXT.

I'm thinking of writing a loop to match each group of expressions. But how do I count the amount of NEXT's in the string? Also how do I get the next set of /BEGIN/ /END/ expressions?

Final Regex question: I have the following in $1 = "CODE C, RECORD D, FILE";

How do I separate the 1st item from the the second, between comas. I.E. I want to have "CODE RECORD FILE" in one Array and "C D" in another. I thought the following would work:

(@recordSource, @recordType) = split(/\s/, split(/,/, $1));

But I only get a number in @recordSource.

Any tips or suggestions would be appreciated! Maybe I need a different approach?

Comment on Multiple RegEx Matches in a single string Download Code

Replies are listed 'Best First'.
Re: Multiple RegEx Matches in a single string by broquaint (Abbot) on May 14, 2003 at 15:09 UTC
Assuming your `BEGIN` and `END` aren't nested `use strict; use Data::Dumper; $_ = "10001 LONG RECORD\n BEGIN RECORD A, CODE B, TABLE C END \n NEXT\n STANDARD DATA 1 BEGIN CODE A, RECORD B END NEXT\n SHORT RECORD BEGIN TABLE B END NEXT\n STANDARD DATA 2 BEGIN CODE C, RECORD D, FILE END;"; my(@records) = m< \bBEGIN\b \s+ (.*?) \s+ \bEND\b >gx; print Dumper(\@records); __output__ $VAR1 = [ 'RECORD A, CODE B, TABLE C', 'CODE A, RECORD B', 'TABLE B', 'CODE C, RECORD D, FILE' ];` [download] As for your final question - firstly `@recordSource` was slurping in all the arguments returned from the `split` and secondly `split` won't work how you expect it to (unfortunately) so consort the docs. Here's what you want (although I'm using an array of arrays of arrays as that seems to map to your data better) Read more... (1411 Bytes) HTH `_________ broquaint`	[reply] [d/l] [select]
Re: Re: Multiple RegEx Matches in a single string by chip (Curate) on May 14, 2003 at 16:13 UTC
Nice presentation. But, depending on the details of the language being parsed, you may need to add an "s" modifier to that regex ... otherwise, BEGIN/END won't be recognized if it spans a newline. -- Chip Salzenberg, Free-Floating Agent of Chaos	[reply]
Re: Multiple RegEx Matches in a single string by Fletch (Bishop) on May 14, 2003 at 15:27 UTC
Insert quote about hammers and everything looking like nails here This is on the border of where you need to stop trying to use just regexen and write a proper parser (either by hand rolling your own or using something like Parse::RecDescent).	[reply]