Re: Multiple RegEx Matches in a single string

Assuming your BEGIN and END aren't nested

use strict;

use Data::Dumper;

$_ = "10001 LONG RECORD\n
BEGIN RECORD A, CODE B, TABLE C END \n
NEXT\n
STANDARD DATA 1 BEGIN CODE A, RECORD B END NEXT\n
SHORT RECORD BEGIN TABLE B END NEXT\n
STANDARD DATA 2 BEGIN CODE C, RECORD D, FILE END;";

my(@records) = m< \bBEGIN\b \s+ (.*?) \s+ \bEND\b >gx;

print Dumper(\@records);

__output__

$VAR1 = [
          'RECORD A, CODE B, TABLE C',
          'CODE A, RECORD B',
          'TABLE B',
          'CODE C, RECORD D, FILE'
        ];
[download]

As for your final question - firstly @recordSource was slurping in all the arguments returned from the split and secondly split won't work how you expect it to (unfortunately) so consort the docs. Here's what you want (although I'm using an array of arrays of arrays as that seems to map to your data better)


my @subrecs;
for(@records) {
  my @items = split ', ';
  push @subrecs => [
    [map { (split ' ')[0] } @items],
    [map { (split ' ')[1] } @items],
  ];
}

print Dumper(\@subrecs);


__output__

$VAR1 = [
          [
            [
              'RECORD',
              'CODE',
              'TABLE'
            ],
            [
              'A',
              'B',
              'C'
            ]
          ],
          [
            [
              'CODE',
              'RECORD'
            ],
            [
              'A',
              'B'
            ]
          ],
          [
            [
              'TABLE'
            ],
            [
              'B'
            ]
          ],
          [
            [
              'CODE',
              'RECORD',
              'FILE'
            ],
            [
              'C',
              'D'
            ]
          ]
        ];
[download]

HTH

_________ broquaint

Comment on Re: Multiple RegEx Matches in a single string Select or Download Code

Replies are listed 'Best First'.
Re: Re: Multiple RegEx Matches in a single string by chip (Curate) on May 14, 2003 at 16:13 UTC
Nice presentation. But, depending on the details of the language being parsed, you may need to add an "s" modifier to that regex ... otherwise, BEGIN/END won't be recognized if it spans a newline. -- Chip Salzenberg, Free-Floating Agent of Chaos	[reply]