in reply to Re^2: using lookaround assertions to grab info
in thread using lookaround assertions to grab info

I too thought that Roy Johnstone's split /\n\b/, ... was inspired. I wish I had thought of it:)

In terms of breaking down my code. The basic statement is pretty simple. It's just an 'add an element to the hash using $1 and $2 while the regex matches'.

$hash{ $1 } = $2 while $data =~ m[(...): (...)]g

The only complicated bit is the regex itself, which uses a lookahead (as you suggested) to determine the end of each multi-line record.

The options: /g, match as many times as you can; /x, ignor whitespace and comments; /s, allow '.' to match newlines so that we can pick up your multi-line bits.

m[ # First we want the key, the text preceding the : (?: \A | \n ) ## from the start the string or a newline ( [^:]+? ) ## capture everyline upto the : into $1 \s* ## but throw away any trailing spaces : ## preceding the : # Now grab everything (including newlines) into $2 (.*?) # but stop if we find a newline followed # by a non-space preceding a : # or the end of string for the last record. (?= # lookahead (?: # non-capture group containing \n # a newline \S # follow by a non-space [^:]* # and anything except a : : # and a : ) | # OR \Z # the EOS ) ]gxs;

As for removing the extraneuos stuff, incorporating Roy Johnstone's simplification, I'd do it like this.

#! perl -slw use strict; use Data::Dumper; my $m = <<'EOM'; Dig No : A081 Prior: 2 Digstrt: 03/30/04 Time: 10:45 Address: 26800 BRADLEY RD Subdivsn: Remarks: DIRECTIONAL BORING=NO. DEPTH EXCEEDS 7 FEET=NO. : TICKET EXPIRES AFTER 04/22/04 Members: ABTL0A AMTCHA CECO5A COMC4A ITHA0A LKFO0A NSGC0A EOM my %parts; while( $m =~ m[ (?: \A | \n ) ( [^:]+? ) \s* : (.*?) (?= (?: \n \b ) | \Z ) ]gxs ) { my( $key, $value ) = ( $1, $2 ); $value =~ s[\n\s+:][]g; $parts{ $key } = $value; } print Dumper \%parts; __END__ P:\test>360501 $VAR1 = { 'Address' => ' 26800 BRADLEY RD Subdivs +n:', 'Members' => ' ABTL0A AMTCHA CECO5A COMC4A ITHA0A LKFO0A NSG +C0A', 'Remarks' => ' DIRECTIONAL BORING=NO. DEPTH EXCEEDS 7 FEET=N +O. TICKET EXPIRES AFTER 04/22/04', 'Dig No ' => ' A081 Prior: 2 Digstrt: 03/30/04 Time: +10:45' };

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail