comment on

I was hoping there would be some way of doing this. ... Sigh. Too bad regex can't do everything!

Be aware of the "if all you have is a hammer, everything looks like a nail" effect. Doing everything in a single regex is nice, but shouldn't be a requirement - sometimes, things can be expressed much more cleanly with a few regexes and some code. And be aware of premature optimization as well - sure, oftentimes a single regex is faster than multiple, but usually it's better to get things working first instead of trying to bend over backwards and trying to wrap your head around a complex regex. Especially in the case you describe, IMHO the brainpower is much better spent on writing up test cases first!

use warnings;
use strict;
use Test::More;

sub my_sentence_splitter {
    my $input = shift;
    my @output;
    # ... magic ...
    return \@output;
}

is_deeply my_sentence_splitter(<<END),
I'm looking for the end of a sentence, where possible.  However, in so
+me cases, I'll need to go with a non-conventional "end" to it, such a
+s: "Here's a quote by a famous person which is supposed to exceed for
+ty words and is therefore required to be set apart as a separate, ind
+ented paragraph per APA style." (Famous, 1999) Note that the regex ne
+eds to look for the full end of the sentence, if it exists: it cannot
+ simply stop at the colon unless there is no further part to the sent
+ence provided in that paragraph.
END
[
    q#I'm looking for the end of a sentence, where possible.#,
    q#However, in some cases, I'll need to go with a non-conventional 
+"end" to it, such as:#,
    q#"Here's a quote by a famous person which is supposed to exceed f
+orty words and is therefore required to be set apart as a separate, i
+ndented paragraph per APA style."#,
    q#(Famous, 1999)#,
    q#Note that the regex needs to look for the full end of the senten
+ce, if it exists: it cannot simply stop at the colon unless there is 
+no further part to the sentence provided in that paragraph.#,
];

# TODO: Many more test cases here!

done_testing;
[download]

In reply to Re^5: How to enforce match priority irrespective of string position by haukex
in thread How to enforce match priority irrespective of string position by Polyglot

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.