in reply to End of sentence regex excluding " i.e." and " e.g."
There isn't a general solution to this problem because of names (e.g, H.G. Wells) and quoting, but perhaps
will be sufficiently robust for your need? In general, for a corpus like this, I'd split it into known good, known bad, and grey, and then use test-driven development in order to build out my filter./[.!?]\s{1,2}(?=[A-Z0-9])/
Update: Augmented regex for ! and ?
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: End of sentence regex excluding " i.e." and " e.g."
by Not_a_Number (Prior) on Feb 06, 2017 at 18:19 UTC | |
by kennethk (Abbot) on Feb 06, 2017 at 18:27 UTC | |
|
Re^2: End of sentence regex excluding " i.e." and " e.g."
by choroba (Cardinal) on Feb 07, 2017 at 08:44 UTC | |
|
Re^2: End of sentence regex excluding " i.e." and " e.g."
by jabowery (Beadle) on Feb 06, 2017 at 18:05 UTC | |
by kennethk (Abbot) on Feb 06, 2017 at 18:35 UTC |