in reply to End of sentence regex excluding " i.e." and " e.g."
I don't think /(?<!( e|\.g)|( i|\.e))\.\s/ does what I think you think it does. If you have Perl version 5.10+ regex extensions, try something like (untested):
my ($exclude) = map qr{ (?: $_) (*SKIP) (*FAIL) }xms, join q{ | }, map qq{\Q$_\E}, reverse sort qw(e.g. i.e. Dr. Mr. Mrs. ... etc.) ; my $delimiter = qr{ $exclude [.?!] \s }xms;
I have no idea how you could handle something like "H.G. Wells".
Update: I was a bit too quick with my post; see my update above. Also, I think I might see a way to exclude initialed names and similar things:
Obviously, this is just a starting point toward a robust solution.my $name = qr{ [[:upper:]] [[:lower:]]+ }xms; my $initialed_name = qr{ \b [[:upper:]] [.] (?= \s+ $name) }xms; my ($exclude) = map qr{ (?: $_) (*SKIP) (*FAIL) }xms, join q{ | }, $initialed_name, map qq{\Q$_\E}, reverse sort qw(e.g. i.e. Dr. Mr. Mrs. ... etc.) ; my $delimiter = qr{ $exclude [.?!] \s }xms;
Update 2: It occurs to me that the above won't handle a name like P.D.Q. Bach, so maybe change $initialed_name as follows (still untested):
my $initial = qr{ \b [[:upper:]] [.] \s* }xms; my $name = qr{ \b [[:upper:]] [[:lower:]]* }xms; my $initialed_name = qr{ $initial+ (?= \s+ $name) }xms;
Give a man a fish: <%-{-{-{-<
|
|---|