Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've attempted to make a perl program to justify text. That is, to make it look like something you would read in a news- paper. (Or a book; if you don't understand, just ask.) I've attempted many revisions to the code, but I still get errors with it. An example of the errors; here is a file that I attempted to justify:
This is some text, that will be aligned hopefully! Just so you know Mark, THERE_is_A_copy_OF_this_TEXT_in the file C:\mydocu~1\mark\textCOPY.txt
It outputs to this, when I run it:
This is some text, that will be alignedhopefu- lly! Just so you know THERE_is_A_copy_OF_this- TEXT_in the fileC:\my- C:\mydocu~1\mark\textCO- Y.txtY.txt
What I do to justify it is read in the first line; I use that as the default length. The program then sees if the next line is (a)longer than the first line, (b)shorter, or (c)the same. If c, we ignore it. If a, we chop off words and move them to the line below. If b, we either (1)add spaces, or (2)if room, insert a word from the next line to the end of the first. I have debugging statements in the code already, and I figure rather than take them out, it may help someone in helping me figure out what is wrong. I've taken out the comments so it will fit in the space easier, but most of it should be straightforward(at least compared to what goes on in the obsfucation section). Here goes:(oh,by the way, to try on your system, you need a file called 'text.txt', or just change the $file variable)
$text = '';$file = "text.txt";$word=''; open(IOfile,"$file") or die "The file $file could not be found, stopped "; $next = <IOfile>;$total = length $next; $debug = shift @ARGV if @ARGV; open(DEBUG,">debug.txt") if $debug;select DEBUG if $debug; LINE: while($line = $next) { next LINE if $line =~ /^$/; $next = <IOfile>; $sCount = $total - length($line); print "LINE: while loop\n" if $debug; print "Line eq '$line'\t\tNext eq '$next'\n" if $debug; print "Total eq $total\t\tsCount eq $sCount\n" if $debug; sCount: while($sCount) { print "sCount: while(sCount) loop\n" if $debug; $line =~ s/^[ \t]+//; $next =~ s/^[ \t]+//; $line =~ s/[ \t]+$//; $next =~ s/[ \t]+$//; $word = $1 if $next =~ /^([^ ]+)/; $wLength = length($word); $words = 0; $words++ while $line =~ /[^ ]+ */g; print "\tLine eq '$line'\t\tNext eq '$next'\n" if $debug; print "\tWord eq '$word'\t\twLength eq $wLength\n" if $debug; print "\tWords eq $words\n" if $debug; while(length($line) > $total) { #Too much text on this line print "while(length(\$line) > \$total) loop\n" if $debug; print "\tline to long\n" if $debug; if ($line =~ /[\t ]/) { #More than one word on line $next =~ s/$next/$1 . $next/e if $line =~ s/ +(\S+) *$//; print "\t\$1 eq '$1'\n" if $debug; print "\tmultiple words\tNew lines now equal:\n" if $debug; print "\tLine eq '$line'\t\tNext eq '$next'\n" if $debug; } else { #Only one word, so hyphenate $next =~ s/\n//g; $line =~ s/\n//g; $next = substr($line,$total) . " $next\n"; $line = substr($line,0,$total-1) . "-\n"; substr($line,$total) = ''; print "one word\n\tNew lines now equal:\n" if $debug; print "\tLine eq '$line'\t\tNext eq '$next'\n" if $debug; } $sCount = $total - length($line); print "sCount eq $sCount\n" if $debug; redo sCount; } if ($wLength < $sCount) { #See if first word on next line will fit in line $line =~ s/\n//g; $line =~ s/$line/$line . $word . "\n"/e; substr($next,0,$wLength) = ''; print "add word to first line\t\tlines now eq\n" if $debug; print "Line eq '$line'\t\tNext eq '$next'\n" if $debug; } elsif ( (($sCount > $words) and ($wLength-($sCount-2) > 3)) ) { #For hyphenation(must fit specific qualifications) $line =~ s/\n//g; $wPart = substr($next,0,$sCount-2); $next =~ s/$wPart//e; $line =~ s/$line/$line . $wPart . "-\n"/e; print "hyphenating\t\tLines now eq\n" if $debug; print "\$wPart eq '$wPart'\n" if $debug; print "Line eq '$line'\t\tNext eq '$next'\n" if $debug; } else { print "Add Spaces\n" if $debug; $line =~ s/([^ ]+ )/$1 /g if $sCount == $words; $line =~ s/([^ ]+ )/$1 / unless $sCount == $words; print "New lines now equal\n" if $debug; print "Line eq '$line'" if $debug; } next LINE; } } continue { $line .= "\n" if $line !~ /\n$/; $text .= $line;} open(IOfile,">$file"); select IOfile; print IOfile "$text";

Replies are listed 'Best First'.
Re: Alignment Program
by epoptai (Curate) on May 26, 2001 at 22:22 UTC
Re: Alignment Program
by dvergin (Monsignor) on May 26, 2001 at 23:49 UTC
    The modules mentioned in other responses here will provide you with a robust solution. But if the problem itself interests you, here's a way to approach the task using a lot less code. (I have omitted the hyphenation bit since I don't quite follow your standards for that part).

    This approach creates a closure (roughly stated: a state-preserving anonymous subroutine) which you then call to spit out a word at a time for handling. This allows your main loop to focus on building the new lines and to ignore the details of parsing the input and knowing when to read in another line.

    #!/usr/bin/perl -w use strict; sub make_word_dispensor { my $infile = shift; open INFILE, "<$infile" or die "Can't open $infile: $!\n"; my @words = (); return sub { unless (@words or (@words = split /\s/, <INFILE>)) { close INFILE; return undef; } return shift @words; } } my $word_dispensor = make_word_dispensor("test.txt"); my $line_length = 50; my $line = ''; while (my $word = &$word_dispensor()) { if (length("$line $word") > $line_length) { print "$line\n"; $line = $word; } else { $line .= ($line ? ' ' : '') . $word; } } print "$line\n";

    Season to taste. HTH

      Thanks. I'll explain the hypenation standard, if it interests anyone:
      elsif ( (($sCount > $words) and ($wLength-($sCount-2) > 3)) )
      First, I see is the space I need to fill is more than the number of words on the line; that is to say, if putting a space between each word would still leave spaces to be filled(so I don't have too many spaces in the file). Moving on. The main point of this expression is just so you won't get text like this:
      This is a line of code, o- r is it?
Re: Alignment Program
by cLive ;-) (Prior) on May 26, 2001 at 22:23 UTC