If you can advance incrementally through the text, you could try anchoring all of your patterns at pos with \G. Since pos is an lvalue, you could store the previous match position, try each pattern in priority order starting at the same previous position, take whichever match you prefer, store that into pos, and repeat for the next chunk. Something like: (untested)
my $lastpos = 0;
while ($lastpos < length $_) {
my @matches = (undef x 3);
pos = $lastpos;
$matches[0] = pos if m/\G([^/?!"]+[.?!"])/gc; #FIRST PRIORITY
pos = $lastpos;
$matches[1] = pos if m/\G([^:;-]+[:;-])/gc; #SECOND PRIORITY
pos = $lastpos;
$matches[2] = pos if m/\G(.*(?:\n|\r|\z|$))/gc; #LAST PRIORITY
# somehow choose which match to use for the next cycle and set $last
+pos here
# substr $_, $lastpos, ($matches[$chosen] - $lastpos)
# should yield the selected chunk between choosing a match and upda
+ting $lastpos
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|