Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Formatting Text

by lollipop7081 (Novice)
on May 12, 2006 at 17:22 UTC ( [id://549071]=perlquestion: print w/replies, xml ) Need Help??

lollipop7081 has asked for the wisdom of the Perl Monks concerning the following question:

I am VERY new to perl, I'm in a undergrad Software Tools class and my professor assigned perl just to see how good we'd do (with only basic instruction on perl).

I DO NOT WANT SOMEONE TO DO MY HOMEOWRK FOR ME!!!

The assignment is that I need to format a text file where each section can be treated as a chapter and the headings should be centered.
He wants a table of contents.
Also, each section should be broken up into seperate pages and each page numbered with roman numerals that match the table of contents.
And finally, he wants an Index whose keywords are any word that appears in the question and the answer and doesn't appear more than 10 times.

Here's a part of the text he provided:
SECTION 35 - BUILDING VIM FROM SOURCE 35.1. How do I build Vim from the sources on a Unix system? For a Unix system, follow these steps to build Vim from the sources: - Download the source and run-time filles archive (vim-##.tar.bz2) fro +m the ftp://ftp.vim.org/pub/vim/unix directory. - Extract the archive using the bzip2 and tar utilities using the com +mand: $ bunzip2 -c <filename> | tar -xf - - Run the 'make' command to configure and build Vim with the default configuration. - Run 'make install' command to installl Vim in the default directory. To enable/disable various Vim features, before running the 'make' comm +and you can run the 'configure' command with different flags to include/ex +clude the various Vim features. To list all the available options for the 'configure' command, use: $ configure -help
He wants to make sure that the characters per line and lines per page can be set in the script. So to do that I've done the following:
#!/usr/bin/perl # # Reformat the project3Dada.txt file to make more accesible $file = '/u/home/jhart2/perl/project3Data.txt'; # Name the fil +e open(INFO, $file); # Open the file for input use strict; use warnings; my $NoOfCharsPerLine = 80; my $NoOfLinesPerPage = 100; my $RightMargin = 7; my $header = 3; my $footer = 5; my @argument; # Read command line arguments @argument = split(/=/, $ARGV[0]); if ($argument[1] =~ /\D/) { print "Nonumeric value for lines option \"$argument[1]\" --scr +ipt aborted\n"; } if ($argument[0] eq "--chars") { $NoOfCharsPerLine = $argument[1]; } elsif ($argument[0] eq "--lines") { $NoOfLinesPerPage = $argument[1]; } else { print "unrecognized command line option \"$argument[0]\" --scr +ipt aborted\n"; exit; } close(INFO); # Close the file print "$NoOfCharsPerLine\n"; print "$NoOfLinesPerPage\n";
So, because I'm so new I'm thinking that maybe I just don't know what to look for. I've used my text book, google, and forums like this one but i'm still at a loss.

All I need is for someone to point me in the right direction to get farther than testing the chars/line and the lines/page.

Thanks so much!

Edited by planetscape - added readmore tags

( keep:2 edit:11 reap:0 )

Replies are listed 'Best First'.
Re: Formatting Text
by ruzam (Curate) on May 12, 2006 at 18:13 UTC
    You may want to consider looping through @ARGV so you can get more than one input argument.

    Showing you how to do it would be far easier then pointing you in the right direction, but here's my thoughts...

    You can't build a table of contents until you know what page the sections are on.
    You can't build an index to key words until you know what page the keywords are on.
    And finally you don't know what page you're on until you squeeze the words into the given line length and squeeze the lines into the given lines per page and account for new sections.

    So off the top, you're going to have to read the entire file before you can write the contents, so you should be thinking about how you're going to store the file data in variables.

    I would think about reformatting each line as you read it to fit the chars/line restriction.
    I would think about keeping a running page counter as you read through the file.
    I would think about counting words and keeping a list of page numbers for each word.
Re: Formatting Text
by Zaxo (Archbishop) on May 12, 2006 at 18:45 UTC

    ruzam++ has given you some very good advice about how to organize your algorithm.

    You don't say how much of perl you are expected to know or use. CPAN has lots of modules which would help, up to nearly solving the whole thing. I'll mention a couple which would help with discrete pieces of the problem.

    You start by parsing an option from the command line. The Getopt::Long module provides the standard way of doing that. It works for multiple options and preserves input filenames listed on the command line. That is probably where you should get the name of the file to crunch. Getopt::Long is a base module distributed with perl, so if you are allowed any modules at all, it should be acceptable.

    The roman numeral requirement for paging is an entertaining subproblem which your instructor may wish you to solve for yourself. There is the Math::Roman module to do it for you, however. You can increment a Math::Roman numeric variable and stringification will produce the roman representation. Math::Roman is not a base module, but is available from CPAN.

    Good luck, and have fun. This seems like an enjoyable exercise to learn with.

    After Compline,
    Zaxo

      If the instructor is giving an assignment to write code in Perl, I would expect them to be aware of all Perl code that is commonly known to be available on the internet(i.e. CPAN). Assigning a problem where a known solution is posted on CPAN seems absurd to me. A better approach in my mind would be to say: study the Math::Roman module that can be found on CPAN and write a program that shows me you know how it works.

      If a person enjoys reinventing the wheel, they can do it on their own time, and more power to them. Making that a classroom assignment would be offensive to me.
Re: Formatting Text
by kwaping (Priest) on May 12, 2006 at 18:40 UTC
    I didn't read the whole post yet, but regarding centered headings, I recommend you read perlform.

    Update: For easy processing of command-line arguments, also explore Getopt::Long.

    ---
    It's all fine and dandy until someone has to look at the code.
Re: Formatting Text
by TedPride (Priest) on May 12, 2006 at 19:47 UTC
    Here's a simple function for Roman numerals:
    use strict; use warnings; for (1..100) { print roman($_), "\n"; } BEGIN { my %roman = ( 1 => 'I', 4 => 'IV', 5 => 'V', 9 => 'IX', 10 => 'X', 40 => 'XL', 50 => 'L', 90 => 'XC', 100 => 'C', 400 => 'CD', 500 => 'D', 900 => 'CM', 1000 => 'M' ); my @roman = sort { $b <=> $a } keys %roman; sub roman { my ($n, $r) = $_[0]; for (@roman) { next if $_ > $n; $r .= $roman{$_} x ($n / $_); $n = $n % $_; } return $r; } }
    I discovered a relatively simple explanation for the algorithm here.
Re: Formatting Text
by TedPride (Priest) on May 12, 2006 at 20:10 UTC
    And for word wrap:
    use strict; use warnings; my $text = join '', <DATA>; print wordwrap($text, 45); sub wordwrap { my ($text, $width, $result) = @_; for (split /\n/, $text) { while (length($_) > $width) { s/^(.{1,$width})\s+//; $result .= "$1\n"; } $result .= "$_\n"; } return $result; } __DATA__ When in the Course of human events, it becomes necessary for one peopl +e to dissolve the political bands which have connected them with anot +her, and to assume among the powers of the earth, the separate and eq +ual station to which the Laws of Nature and of Nature's God entitle t +hem, a decent respect to the opinions of mankind requires that they s +hould declare the causes which impel them to the separation.
Re: Formatting Text
by moklevat (Priest) on May 12, 2006 at 21:18 UTC
    Since you are just looking for a pointer in the right direction, for formatting text I have found the Perl6::Form module to be very effective.
Re: Formatting Text
by SamCG (Hermit) on May 12, 2006 at 18:46 UTC
    I'd add that you also need to know what constitutes a section/chapter title, both for centering and building your TOC.

    I'd suggest regular expressions for this -- look at what's common on all the chapters of interest. Regular expressions will also likely be helpful in other parts of your task . . .

    And, you should look at printf/sprintf.

    Good luck!

    -----------------
    s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
Re: Formatting Text
by TedPride (Priest) on May 12, 2006 at 20:19 UTC
    And centering text (assuming the headings are smaller than the width you want - if not, you may want to word wrap first and center each line):
    use strict; use warnings; my $heading = "NIMBLE CATS FALL SAFELY"; my $width = 80; print '-' x $width, "\n"; print center($heading, $width), "\n"; print '-' x $width, "\n"; sub center { my ($heading, $width) = @_; return ' ' x (($width - length($heading)) / 2) . $heading; }
Re: Formatting Text
by Trix606 (Monk) on May 12, 2006 at 19:05 UTC
    I would find a copy of Learning Perl somewhere (library, book store, a friends bookshelf). It has clear explanations of exactly what you need to do this assignment. Regexes for picking out the different lines of text you need to handle, Formats for printing your new file the way you want.

    Learning Perl will get you up to speed on what you need in no time. (Rhyme unintentional but I'll take it.)

      I'd recommend any of the resources here. Also, don't forget PM's Tutorials... :-)

      HTH,

      planetscape

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://549071]
Approved by kvale
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (10)
As of 2024-03-29 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found