Regex not greedy enough

Boldra has asked for the wisdom of the Perl Monks concerning the following question:

Ok, first here's my data;

   record1
    field2 2345
   record2
   record3
    field1 GAGGA
    field2 7848
    field2a 5m

Each field is slightly indented, with a record always beginning at column 4.

What I'm trying to do is break this up into records, so I can process the fields. Here's the basic algorithm I'm playing with:

 foreach $record ($formatted_data =~ m/(^ {3}\w.*)/mg) {
   &process_record($record);
 }
[download]

Which only returns the first line of a given record, as though my "." wasn't matching \n.

I gave up trying to split the data, because I need that first \w. I also abandoned using [\s\S] instead of "." , since that was too greedy; yet not greedy enough if I subdued it with a "?" thus: m/(^ {3}\w[\s\S]*?)/mg.

There must be a middle ground here somewhere... I can feel I'm close...

Comment on Regex not greedy enough Download Code

Replies are listed 'Best First'.
Re: Regex not greedy enough by merlyn (Sage) on Nov 17, 2000 at 19:04 UTC
`@records = split /(^ {3}\w.*\n)/m, $input;` [download] should give you: `"", " record1blah\n", " dataforrecord1\n moredataforrecord1\n", " record2blah\n", " dataforrecord2\n moredataforrecord2\n", ...` [download] You'll need to toss that first empty element... it's the part of the string leading up to your first record. -- Randal L. Schwartz, Perl hacker	[reply] [d/l] [select]
Re: Re: Regex not greedy enough by snax (Hermit) on Nov 17, 2000 at 19:11 UTC
Now that is handy! I didn't know you could capture the split regex stuff that way. Thanks!	[reply]
Re: Regex not greedy enough by snax (Hermit) on Nov 17, 2000 at 18:16 UTC
Use the `/s` modifier rather than `/m`. See the top of perlman:perlre.	[reply]
Re: Regex not greedy enough by japhy (Canon) on Nov 17, 2000 at 18:19 UTC
Assuming you have the entire thing in the string, I would suggest `split()`ing with lookahead: `# split RIGHT BEFORE a \n followed by 'record' @records = split /(?=\nrecord)/, $data;` [download] `$monks{japhy}++ while $posting;`	[reply] [d/l]
Re: Regex not greedy enough by Boldra (Curate) on Nov 17, 2000 at 18:46 UTC
Thanks for the comments; Snax - you're right, I was using the opposite switch to the one I meant, but I still have the same greed problems. Japhy - This is new to me and it looks like the kind of solution I was after. However ?= matches nothing as `(?=^ {3}\w)`, and I can't use \n, since then I skip my first record. Any more ideas? BTW: the 'record1' string is actually the first field of the record; it could be anything beginning with a \w oh yeah, here's my test source: `#!/usr/bin/perl -w use strict; my($infile,@records); while(<DATA>) {$infile.=$;} @records = (split(/(?=^ {3}\w)/,$infile); #returns whole list #@records = ($infile =~ m/(^ {3}\w.*?)/sg); #returns only up to \w print join("\n========\n",@records); __DATA__ record1 field2 2345 record2 record3 field1 GAGGA field2 7848 field2a 5m` [download]	[reply] [d/l]
Re: Re: Regex not greedy enough by japhy (Canon) on Nov 17, 2000 at 19:16 UTC
It won't match `(?=^ANYTHING)` at any place but the very beginning of the string unless you have the `/m` modifier on in the regex, which allows `^` to match after newlines. Ohhhhhh. I didn't think you meant ALL the text was indented, I thought you meant the 'field' parts where. Well then, to make it work with such data: `my $code; { local $/; $code = <DATA> } # fast "slurping" @records = split /\n (?=\w)/, $code; for (@records) { print ">>$_<<\n"; } __DATA__ japhy DALnet regular Regex Prince merlyn Perl Hacker O'Reilly Author Mark_Dominus IAQ Author ArrayHashMonster Creator` [download] `japhy`... Perl Hacker and Regex Prince	[reply] [d/l]
Re: Re: Regex not greedy enough by snax (Hermit) on Nov 17, 2000 at 19:01 UTC
Use japhy's suggestion and add a newline to your string: `@records = split /(?=\nrecord)/, ("\n" . $data);` [download] ...that way you get the necesary first newline in the regex for the first record. Crude, but effective :)	[reply] [d/l]
Re: Re: Regex not greedy enough by Fastolfe (Vicar) on Nov 17, 2000 at 20:25 UTC
If you want to capture to the end of the line, in `/m` mode, `$` anchors at the end of a line. In `/s` mode, `^` matches at the beginning of the string and `$` matches at the end. So maybe you did want `/m`. Of course, you can also just do something like these: `while (<DATA>) { my ($key, $value) = /(\S+) (.*)/ or next; # or: my ($key, $value) = split; # or: (undef, $key, $value) = split(/\s+/, $_, 3); $hash{$key} = $value; # or: push(@{$hash{$key}}, $value); }` [download] Untested, but you might get some ideas from that.	[reply] [d/l] [select]