Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This might sound fairly simple, but here goes, and I will try to explain to the best of my knowledge. I need to make a simple perl script, the script needs to take a .txt file, with the name that I choose at the time of coding, lets name the file mytextfile.txt for the time being. So, the script would take the information from the text file and place it into a properly formatted HTML file, one that is simple <html><head><title>Title</title><head><html> and then <body> and insert the proper
where is finds a break it lines. Hopefully that is fairly simple enough to explain. The next question that I would like to know is does anyone know how I could possibly do a word count and if possible search out some certain words that I have chosen, say, the word "the" and either possibly change every instance of the word to a different color or perhaps to a bold or an italic. If any of you can help me with this I would greatly appreciate it. Thanks, P.S. All the input files and words that I would like to change would be hard coded no need to be able to change them on the fly. And if you could tell me how to recognize line breaks and things like that or give me a quick explanation on it it would be greatly appreciated. Thanks again
  • Comment on Need help with a Simple File, transferring a plain text document to a plain HTML document

Replies are listed 'Best First'.
Re: Need help with a Simple File, transferring a plain text document to a plain HTML document
by Roger (Parson) on Nov 28, 2003 at 08:13 UTC
    To generate the HTML -
    use strict; use IO::File; die "usage: $0 [filename.txt]" if $#ARGV < 0; my $f = new IO::File $ARGV[0], "r" or die "File $ARGV[0] not found."; my $text = do {local $/; <$f>}; print <<HTML <html> <head><title>$ARGV[0]</title></head> <body> <pre> $text </pre> </body> </html> HTML ;
    To get a word count, simply do this -
    my $wordcount = $text =~ /\w+/g;
    Update: Ops, someone reminded me that this is a homework question. I suddenly felt bad about giving out answers like that. But this solution isn't going to get 100% because it has hidden loop holes, so that's fair.

Re: Need help with a Simple File, transferring a plain text document to a plain HTML document
by allolex (Curate) on Nov 28, 2003 at 09:26 UTC

    You should consider using something like HTML::Template to generate the HTML frame and then do a simple substitution in your text to get the desired formatting. Your template will look something like this:

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title> <TMPL_VAR ESCAPE="HTML" NAME="title"> </title> <style> .hit { font-color: red; font-weight: bolder; } </style> </head> <body> <TMPL_VAR ESCAPE="HTML" NAME="my_text"> </body> </html>

    And your Perl code will resemble this:

    #!/usr/local/bin/perl # Usage: perl "thisfile.pl inputfile.txt" use warnings; use strict; use HTML::Template; my $title = "Page Title"; # or a variable representing a # command-line argument, or # something extracted from your # input file. my $text; { local $/ = undef; $text = <>; } $text =~ s/\b(myword)\b/<span class="hit">$1</span> /g; my $template = HTML::Template->new(filename => 'my_template.tmpl'); $template->param(date => "$title"); $template->param(date => "$text"); print $template->output;

    Caveat: I haven't tested this code.

    --
    Allolex

    Perl and Linguistics
    http://world.std.com/~swmcd/steven/perl/linguistics.html
    http://www.linuxjournal.com/article.php?sid=3394
    http://www.wall.org/~larry/keynote/keynote.html

Re: Need help with a Simple File, transferring a plain text document to a plain HTML document
by aquarium (Curate) on Nov 28, 2003 at 13:20 UTC
    you'll get extra credits for your assignment using perl beautifier...so you can write your own syntax rules for highlighting etc and as a bonus the program generates html. If the files to word count are large, then don't use regex on whole file, use $words=sytem("wc mytextfile.txt") or write your own wc routine to add up the \w regex line per line. that's all...class dismissed :)