Hello kyaloupe, and welcome to the Monastery!

Since you’re reading the text in paragraph mode, I don’t see why you need any regex to identify paragraphs? Also, unless your data (not shown) is special, I don’t see why you need such a complicated regex to identify sentences? In any case, here is how I would tackle this problem:

#! perl use strict; use warnings; local $/ = ''; # Paragraph mode my $sentence_count = 0; my $paragraph_count = 0; my @paragraphs; while (my $paragraph = <DATA>) { my @sentences; while ($paragraph =~ m{\s*(.+?(?:\.|\?|!|$))}g) { push @sentences, "<s>$1</s>"; ++$sentence_count; } push @paragraphs, "<p>\n\t" . join("\n\t", @sentences) . "\n</p>\n +"; ++$paragraph_count; } print "\nTotal sentences: $sentence_count\n"; print "Total paragraphs: $paragraph_count\n"; print for @paragraphs; __DATA__ The quick brown fox jumped over the unfortunate dog. What a shame! She sells seashells by the sea shore. Peter Piper picked a peck of pic +kled peppers. Didn't he? Yes, he did. This sentence has no termination

Output:

17:55 >perl 741_SoPW.pl Total sentences: 7 Total paragraphs: 3 <p> <s>The quick brown fox jumped over the unfortunate dog.</s> <s>What a shame!</s> </p> <p> <s>She sells seashells by the sea shore.</s> <s>Peter Piper picked a peck of pickled peppers.</s> <s>Didn't he?</s> <s>Yes, he did.</s> </p> <p> <s>This sentence has no termination</s> </p> 17:55 >

As you can see, I identify sentences as each paragraph is read in, and then wrap what is found in the appropriate tags. See join. (I’ve added tabs just to make the structure of the markup easier to see when it’s printed out.)

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


In reply to Re^3: How to match regex over multiline file by Athanasius
in thread How to match regex over multiline file by kyaloupe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.