I too thought that Roy Johnstone's split /\n\b/, ... was inspired. I wish I had thought of it:)
In terms of breaking down my code. The basic statement is pretty simple. It's just an 'add an element to the hash using $1 and $2 while the regex matches'.
$hash{ $1 } = $2 while $data =~ m[(...): (...)]g
The only complicated bit is the regex itself, which uses a lookahead (as you suggested) to determine the end of each multi-line record.
The options: /g, match as many times as you can; /x, ignor whitespace and comments; /s, allow '.' to match newlines so that we can pick up your multi-line bits.
m[
# First we want the key, the text preceding the :
(?: \A | \n ) ## from the start the string or a newline
( [^:]+? ) ## capture everyline upto the : into $1
\s* ## but throw away any trailing spaces
: ## preceding the :
# Now grab everything (including newlines) into $2
(.*?)
# but stop if we find a newline followed
# by a non-space preceding a :
# or the end of string for the last record.
(?= # lookahead
(?: # non-capture group containing
\n # a newline
\S # follow by a non-space
[^:]* # and anything except a :
: # and a :
)
| # OR
\Z # the EOS
)
]gxs;
As for removing the extraneuos stuff, incorporating Roy Johnstone's simplification, I'd do it like this.
#! perl -slw
use strict;
use Data::Dumper;
my $m = <<'EOM';
Dig No : A081 Prior: 2 Digstrt: 03/30/04 Time: 10:45
Address: 26800 BRADLEY RD Subdivsn:
Remarks: DIRECTIONAL BORING=NO. DEPTH EXCEEDS 7 FEET=NO.
: TICKET EXPIRES AFTER 04/22/04
Members: ABTL0A AMTCHA CECO5A COMC4A ITHA0A LKFO0A NSGC0A
EOM
my %parts;
while(
$m =~ m[
(?: \A | \n ) ( [^:]+? ) \s* :
(.*?)
(?= (?: \n \b ) | \Z )
]gxs
) {
my( $key, $value ) = ( $1, $2 );
$value =~ s[\n\s+:][]g;
$parts{ $key } = $value;
}
print Dumper \%parts;
__END__
P:\test>360501
$VAR1 = {
'Address' => ' 26800 BRADLEY RD Subdivs
+n:',
'Members' => ' ABTL0A AMTCHA CECO5A COMC4A ITHA0A LKFO0A NSG
+C0A',
'Remarks' => ' DIRECTIONAL BORING=NO. DEPTH EXCEEDS 7 FEET=N
+O. TICKET EXPIRES AFTER 04/22/04',
'Dig No ' => ' A081 Prior: 2 Digstrt: 03/30/04 Time:
+10:45'
};
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
|