in reply to Parsing semi-erratic text
Not quite. Consider (note I've trimmed the number of trailing spaces and retained the line ends (but strip them):
use strict; use warnings; use Date::EzDate; my $str = <<DATA; Security : BULGY N V- Item Overridden : Earnings Per Share Initial Value : (USD) Current Value : () Overridden Value : 160 (USD) Effective : 08/20/1999 through 08/20/2000 Override Type : Data SecurityID : 1076665 Sedol : 2451234 Cusip : N66696606 ISIN : NL0006122988 DATA $str =~ s/\n//g; while ($str =~ /(.*?):\s(.*?)\s\s/g) { my ($key, $value) = ($1, $2); $key =~ s/^\s*//; $key =~ s/\s*$//; $value =~ s/^\s*//; $value =~ s/\s*$//; print ">$key: $value<\n"; }
Prints:
>Security: BULGY N V-< >Item Overridden: Earnings Per Share< >Initial Value: (USD)< >Current Value: ()< >Overridden Value: 160 (USD)< >Effective: 08/20/1999 through 08/20/2000< >Override Type: Data SecurityID< >: 1076665 Sedol< >: 2451234 Cusip< >: N66696606 ISIN< >: NL0006122988<
The regex /(\w[\w ]{17}):\s+((?:(?!\w[\w ]{17}:).)*)/g latches on to a 18 character wide label preceeding a : and then grabs characters upto the next label field. The result is:
>Security: BULGY N V-< >Item Overridden: Earnings Per Share< >Initial Value: (USD)< >Current Value: ()< >Overridden Value: 160 (USD)< >Effective: 08/20/1999 through 08/20/2000< >Override Type: Data< >SecurityID: 1076665< >Sedol: 2451234< >Cusip: N66696606< >ISIN: NL0006122988<
|
|---|