Analysing text files to obtain statistics on their content You are to write a Perl program that analyses text files to obtain sta +tistics on their content. The program should operate as follows: 1) When run, the program should check if an argument has been provided +. If not, the program should prompt for, and accept input of, a filen +ame from the keyboard. 2) The filename, either passed as an argument or input from the keyboa +rd, should be checked to ensure it is in MS-DOS format. The filename +part should be no longer than 8 characters and must begin with a lett +er or underscore character followed by up to 7 letters, digits or und +erscore characters. The file extension should be optional, but if giv +en is should be ".TXT" (upper- or lowercase). If no extension if given, ".TXT" should be added to the end of the fil +ename. So, for example, if "testfile" is input as the filename, this +should become "testfile.TXT". If "input.txt" is entered, this should +remain unchanged. 3) If the filename provided is not of the correct format, the program +should display a suitable error message and end at this point. 4) The program should then check to see if the file exists using the f +ilename provided. If the file does not exist, a suitable error messag +e should be displayed and the program should end at this point. 5) Next, if the file exists but the file is empty, again a suitable er +ror message should be displayed and the program should end. 6) The file should be read and checked to display crude statistics on +the number of characters, words, lines, sentences and paragraphs that + are within the file. I am very new to Perl and have managed to compile this code using exam +ples from various books. Could anyone oversee this coding and see how + it could be improved. #!/usr/bin/perl use strict; use warnings; if ($#ARGV == -1) #no filename provided as a command line argument. { print("Please enter a filename: "); $filename = <STDIN>; chomp($filename); } else #got a filename as an argument. { $filename = $ARGV[0]; } #perform the specified checks #check if filename is valid, exit if not if ($filename !~ m^/[a-z]{1,7}\.TXT$/i) { die("File format not valid\n");) } if ($filename !~ m/\.TXT$/i) { $filename .= ".TXT"; } #check if filename is actual file, exit if it is. if (-e $filename) { die("File does not exist\n"); } #check if filename is empty, exit if it is. if (-s $filename) { die("File is empty\n"); } my $i = 0; my $p = 1; my $words = 0; my $chars = 0; open(READFILE, "<$data1.txt") or die "Can't open file '$filename: $!"; + #then use a while loop and series of if statements similar to the foll +owing while (<READFILE>) { chomp; #removes the input record Separator $i = $.; #"$". is the input record line numbers, $i++ will also work $p++ if (m/^$/); #count paragraphs $my @t = split (/\s+/); #split sentences into "words" $words += @t; #add count to $words $chars += tr/ //c; #tr/ //c count all characters except spaces and add + to $chars } #display results print "There are $i lines in $data1\n"; print "There are $p Paragraphs in $data1\n"; print "There are $words in $data1\n"; print "There are $chars in $data1\n"; close(READFILE);

In reply to Analysing text files to obtain statistics on their content. by Davo1977

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.