Based on your update, it looks like you are trying to do two things: remove a paragraph break when it occurs in the middle of a sentence, and add a paragraph break somewhere else where there wasn't one originally.

If that's the case, you need some sort of rule for saying where paragraph breaks need to be added. (The rule for breaks that need to be removed is clear enough.) Maybe the rule is more like a single move rather than a delete plus an insert? That is, if a break is found in mid-sentence, move it to the end of that sentence. This should also be simple.

Here's a slightly different approach, that uses the special perl variable $/ (input record separator) to read a whole paragraph at a time, assuming that paragraph breaks are consistently marked by one or more blank lines:

#!/usr/bin/perl use strict; my $Usage = "Usage: $0 filename.txt > fixed.txt\n"; die $Usage unless ( @ARGV == 1 and -f $ARGV[0] ); $/ = ''; # empty string means blank lines mark end-of-record (cf. per +ldoc perlvar) my @pars = <>; # read all paragraphs into @pars my $sterm = qr/[.!?][)"']*/; # regex for end-of-sentence for ( my $i = 0; $i < $#pars; $i++ ) # skip last paragraph { next if ( $pars[$i] =~ /$sterm\s*$/ ); # get here when paragraph ends in mid-sentence my $j = $i + 1; # refer to next par for tail part of sentence ( my $tail ) = ( $pars[$j] =~ /(.*?$sterm)\s*/ ); $pars[$i] =~ s/\s*$/ $tail\n\n/; # add tail to current par $pars[$j] =~ s/\Q$tail\E\s*//; # remove it from next par } print @pars;
(Note that the end-of-sentence pattern allows for "quoted and/or parenthesized sentences.")

In reply to Re: fixing paragraphs by graff
in thread fixing paragraphs by bfdi533

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.