in reply to regex: seperating parts of non-formatted names
Breaking the thing into pieces is surely the way to go. My following code works with the included test names, but there may be some names that break it. The fields will contain trailing spaces, which can be removed afterwards.
#!usr/bin/perl -w @Data = ( 'Dr. Foo B. Baz', 'Ms Bar', 'Foo Bar', 'Col Foo Bar', 'Foo E.G. Bar', 'Baz', ) ; my $Title = qr/ (?: LTC | COL | DR | MS | MR | MISS ) /ix ; for (@Data) { / ( (?: $Title \.? \s+ )? ) ( (?: [\w-]+ \s+ )? ) ( (?: (?: \w\.\s*? )+ \s+ )? ) ( [\w-]+? \s* $ ) /ix; my $i++; ( $Fields{'title' }[$i], $Fields{'name' }[$i], $Fields{'initials'}[$i], $Fields{'surname' }[$i], ) = ( $1, $2, $3, $4 ); print "$1:$2:$3:$4\n"; } __DATA__ Dr. :Foo :B. :Baz Ms :::Bar :Foo ::Bar Col :Foo ::Bar :Foo :E.G. :Bar :::Baz
~Django
"Why don't we ever challenge the spherical earth theory?"
|
|---|