in reply to fixing paragraphs

Based on your update, it looks like you are trying to do two things: remove a paragraph break when it occurs in the middle of a sentence, and add a paragraph break somewhere else where there wasn't one originally.

If that's the case, you need some sort of rule for saying where paragraph breaks need to be added. (The rule for breaks that need to be removed is clear enough.) Maybe the rule is more like a single move rather than a delete plus an insert? That is, if a break is found in mid-sentence, move it to the end of that sentence. This should also be simple.

Here's a slightly different approach, that uses the special perl variable $/ (input record separator) to read a whole paragraph at a time, assuming that paragraph breaks are consistently marked by one or more blank lines:

#!/usr/bin/perl use strict; my $Usage = "Usage: $0 filename.txt > fixed.txt\n"; die $Usage unless ( @ARGV == 1 and -f $ARGV[0] ); $/ = ''; # empty string means blank lines mark end-of-record (cf. per +ldoc perlvar) my @pars = <>; # read all paragraphs into @pars my $sterm = qr/[.!?][)"']*/; # regex for end-of-sentence for ( my $i = 0; $i < $#pars; $i++ ) # skip last paragraph { next if ( $pars[$i] =~ /$sterm\s*$/ ); # get here when paragraph ends in mid-sentence my $j = $i + 1; # refer to next par for tail part of sentence ( my $tail ) = ( $pars[$j] =~ /(.*?$sterm)\s*/ ); $pars[$i] =~ s/\s*$/ $tail\n\n/; # add tail to current par $pars[$j] =~ s/\Q$tail\E\s*//; # remove it from next par } print @pars;
(Note that the end-of-sentence pattern allows for "quoted and/or parenthesized sentences.")

Replies are listed 'Best First'.
Re^2: fixing paragraphs
by bfdi533 (Friar) on Jun 20, 2005 at 15:41 UTC
    Actually, that is precisely what I am looking for. The help of the monks here is pretty stellar and your time is very much appreciated!