in reply to Re: using lookaround assertions to grab info
in thread using lookaround assertions to grab info

drats... I composed a reply to this and then clicked somewhere else and lost it. Here is my second try...

Your code seems to produce correct values, but not quite. More on that in a bit. But, since I am an acknowledged noob, I will have to spend quite a bit of time staring at...

$parts{ $1 } = $2 while $m =~ m[ (?: \A | \n ) ( [^:]+ ) \s* : (.*?) (?= (?: \n \S [^:]* : ) | \Z ) ]gxs;
...to figure out what is going on. I will do that and hopefully learn something, but at first glance it seems a bit beyond me for now.

That said, the result is not what I want. Here is how --

# You have 'Remarks' => ' DIRECTIONAL BORING=NO. DEPTH EXCEEDS 7 FEET=NO. : TICKET EXPIRES AFTER 04/22/04', 'Dig No ' => ' A081 Prior: 2 Digstrt: 03/30/04 Time: 10:45' # # I want 'Remarks' => ' DIRECTIONAL BORING=NO. DEPTH EXCEEDS 7 FEET=NO. TICKET +EXPIRES AFTER 04/22/04', 'Dig No ' => ' A081', 'Prior' => 2, 'Digstrt' => '03/30/04', 'Time' => '10:45'
All that said, Roy Johnson's suggestion of splitting the lines on /\n\b/ set me on the right path and did the trick.

Thanks.

Replies are listed 'Best First'.
Re^3: using lookaround assertions to grab info
by BrowserUk (Patriarch) on Jun 04, 2004 at 03:22 UTC

    I too thought that Roy Johnstone's split /\n\b/, ... was inspired. I wish I had thought of it:)

    In terms of breaking down my code. The basic statement is pretty simple. It's just an 'add an element to the hash using $1 and $2 while the regex matches'.

    $hash{ $1 } = $2 while $data =~ m[(...): (...)]g

    The only complicated bit is the regex itself, which uses a lookahead (as you suggested) to determine the end of each multi-line record.

    The options: /g, match as many times as you can; /x, ignor whitespace and comments; /s, allow '.' to match newlines so that we can pick up your multi-line bits.

    m[ # First we want the key, the text preceding the : (?: \A | \n ) ## from the start the string or a newline ( [^:]+? ) ## capture everyline upto the : into $1 \s* ## but throw away any trailing spaces : ## preceding the : # Now grab everything (including newlines) into $2 (.*?) # but stop if we find a newline followed # by a non-space preceding a : # or the end of string for the last record. (?= # lookahead (?: # non-capture group containing \n # a newline \S # follow by a non-space [^:]* # and anything except a : : # and a : ) | # OR \Z # the EOS ) ]gxs;

    As for removing the extraneuos stuff, incorporating Roy Johnstone's simplification, I'd do it like this.

    #! perl -slw use strict; use Data::Dumper; my $m = <<'EOM'; Dig No : A081 Prior: 2 Digstrt: 03/30/04 Time: 10:45 Address: 26800 BRADLEY RD Subdivsn: Remarks: DIRECTIONAL BORING=NO. DEPTH EXCEEDS 7 FEET=NO. : TICKET EXPIRES AFTER 04/22/04 Members: ABTL0A AMTCHA CECO5A COMC4A ITHA0A LKFO0A NSGC0A EOM my %parts; while( $m =~ m[ (?: \A | \n ) ( [^:]+? ) \s* : (.*?) (?= (?: \n \b ) | \Z ) ]gxs ) { my( $key, $value ) = ( $1, $2 ); $value =~ s[\n\s+:][]g; $parts{ $key } = $value; } print Dumper \%parts; __END__ P:\test>360501 $VAR1 = { 'Address' => ' 26800 BRADLEY RD Subdivs +n:', 'Members' => ' ABTL0A AMTCHA CECO5A COMC4A ITHA0A LKFO0A NSG +C0A', 'Remarks' => ' DIRECTIONAL BORING=NO. DEPTH EXCEEDS 7 FEET=N +O. TICKET EXPIRES AFTER 04/22/04', 'Dig No ' => ' A081 Prior: 2 Digstrt: 03/30/04 Time: +10:45' };

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail