Analysing text files to obtain statistics on their content
You are to write a Perl program that analyses text files to obtain sta
+tistics on their content. The program should operate as follows:
1) When run, the program should check if an argument has been provided
+. If not, the program should prompt for, and accept input of, a filen
+ame from the keyboard.
2) The filename, either passed as an argument or input from the keyboa
+rd, should be checked to ensure it is in MS-DOS format. The filename
+part should be no longer than 8 characters and must begin with a lett
+er or underscore character followed by up to 7 letters, digits or und
+erscore characters. The file extension should be optional, but if giv
+en is should be ".TXT" (upper- or lowercase).
If no extension if given, ".TXT" should be added to the end of the fil
+ename. So, for example, if "testfile" is input as the filename, this
+should become "testfile.TXT". If "input.txt" is entered, this should
+remain unchanged.
3) If the filename provided is not of the correct format, the program
+should display a suitable error message and end at this point.
4) The program should then check to see if the file exists using the f
+ilename provided. If the file does not exist, a suitable error messag
+e should be displayed and the program should end at this point.
5) Next, if the file exists but the file is empty, again a suitable er
+ror message should be displayed and the program should end.
6) The file should be read and checked to display crude statistics on
+the number of characters, words, lines, sentences and paragraphs that
+ are within the file.
I am very new to Perl and have managed to compile this code using exam
+ples from various books. Could anyone oversee this coding and see how
+ it could be improved.
#!/usr/bin/perl
use strict;
use warnings;
if ($#ARGV == -1) #no filename provided as a command line argument.
{
print("Please enter a filename: ");
$filename = <STDIN>;
chomp($filename);
}
else #got a filename as an argument.
{
$filename = $ARGV[0];
}
#perform the specified checks
#check if filename is valid, exit if not
if ($filename !~ m^/[a-z]{1,7}\.TXT$/i)
{
die("File format not valid\n");)
}
if ($filename !~ m/\.TXT$/i)
{
$filename .= ".TXT";
}
#check if filename is actual file, exit if it is.
if (-e $filename)
{
die("File does not exist\n");
}
#check if filename is empty, exit if it is.
if (-s $filename)
{
die("File is empty\n");
}
my $i = 0;
my $p = 1;
my $words = 0;
my $chars = 0;
open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!";
+
#then use a while loop and series of if statements similar to the foll
+owing
while (<READFILE>) {
chomp; #removes the input record Separator
$i = $.; #"$". is the input record line numbers, $i++ will also work
$p++ if (m/^$/); #count paragraphs
$my @t = split (/\s+/); #split sentences into "words"
$words += @t; #add count to $words
$chars += tr/ //c; #tr/ //c count all characters except spaces and add
+ to $chars
}
#display results
print "There are $i lines in $data1\n";
print "There are $p Paragraphs in $data1\n";
print "There are $words in $data1\n";
print "There are $chars in $data1\n";
close(READFILE);
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.