in reply to Re^2: End of sentence regex excluding " i.e." and " e.g."
in thread End of sentence regex excluding " i.e." and " e.g."
#!/usr/bin/perl use strict; use warnings; use File::Stream; my ($handler, $stream) = File::Stream->new( \*DATA, read_length => 1024, separator => qr/(?<!\b[A-Z])(?<!e\.g)(?<!i\.e)[.!?]\s{1,2}(?=[A-Z0 +-9])/, ); while (<$stream>) { print "*$_\n\n" ; } __DATA__ Perl filehandles are streams, but sometimes they just aren't powerful enough. This module offers to have streams from filehandles searched with regexes and allows the global input record separator variable to contain regexes. Thus, readline() and the <> operator can now return records delimited by regular expression matches. There are some very important gripes with applying regular expressions to (possibly infinite) streams. Please read the CAVEATS section of this documentation carfully. Some bunnys are fluffy, e.g. Peter. H.G. Wells was a great author. Some sports require specialized equipment, e.g. baseball.
Debugging is hard without particular examples from your corpus.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
|
|---|