in reply to fixing paragraphs
If that's the case, you need some sort of rule for saying where paragraph breaks need to be added. (The rule for breaks that need to be removed is clear enough.) Maybe the rule is more like a single move rather than a delete plus an insert? That is, if a break is found in mid-sentence, move it to the end of that sentence. This should also be simple.
Here's a slightly different approach, that uses the special perl variable $/ (input record separator) to read a whole paragraph at a time, assuming that paragraph breaks are consistently marked by one or more blank lines:
(Note that the end-of-sentence pattern allows for "quoted and/or parenthesized sentences.")#!/usr/bin/perl use strict; my $Usage = "Usage: $0 filename.txt > fixed.txt\n"; die $Usage unless ( @ARGV == 1 and -f $ARGV[0] ); $/ = ''; # empty string means blank lines mark end-of-record (cf. per +ldoc perlvar) my @pars = <>; # read all paragraphs into @pars my $sterm = qr/[.!?][)"']*/; # regex for end-of-sentence for ( my $i = 0; $i < $#pars; $i++ ) # skip last paragraph { next if ( $pars[$i] =~ /$sterm\s*$/ ); # get here when paragraph ends in mid-sentence my $j = $i + 1; # refer to next par for tail part of sentence ( my $tail ) = ( $pars[$j] =~ /(.*?$sterm)\s*/ ); $pars[$i] =~ s/\s*$/ $tail\n\n/; # add tail to current par $pars[$j] =~ s/\Q$tail\E\s*//; # remove it from next par } print @pars;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: fixing paragraphs
by bfdi533 (Friar) on Jun 20, 2005 at 15:41 UTC |