in reply to Statistics from a txtfile

To get the statistics you want, it may be easier to slurp the whole file into a variable then process it through a series of regexes. Here's the basic idea, but as suaveant suggests - read up on regular expressions:

open (READFILE, $filename)|| die "Failed to open $filename: $!"; my $filecontents; { local undef $/; $filecontents = <READFILE>; } close <READFILE>; my @words = $filecontents =~ / ... /g; my $wordcount = scalar @words; my @characters = ... my @sentences = ... # etc

Replies are listed 'Best First'.
Re^2: Statistics from a txtfile
by mbdc566021 (Initiate) on Dec 30, 2007 at 15:49 UTC
    Hi There,
    I have reviewed my code and have put it into some sort of structure.
    It was to the problem that i had to:
    ask for a filename
    check file to see if its ms-dos file format
    filename should be no longer than 8 characters and should begin with an _underscore or letter and should end with .txt not case sensitive
    if not then it should add .txt
    program should check whether file exists and not empty
    should read the file by character and get the following statistics:
    character count including whitespace the punctuation. number of words. paragraphs. lines and sentences. output details to a separate .txt file.
    #!/usr/bin/perl print("Please enter filename: "); $filename = <STDIN>; chomp ($filename); if ($filename=~m/^[_a-zA-Z][^.]{1,7}\.txt$/) open (READFILE, $filename)|| die "Failed to open $filename: $!"; my $filecontents; { local undef $/; $filecontents = <READFILE>; } while (<READFILE>) my @characters = ($filecontents =~m/\b/g); my @words = ($filecontents =~m/\b\s/g); my $wordcount = scalar @words; my @paragraph =($ my @sentences = ($filecontents=~m/\.$/); $CharCount{ $characters }++; $wordcount{ $wordcount)++; etc close <READFILE>; open(OUT, ">data1.txt") || die "data1.txt not open: $!"; output data here close(OUT);
    This is as far as I have got. Could you please elaborate on my coding further?
    thankyou kindly

      Hi mbdc566021,

      Your code looks like it has just been cobbled together with snippets from various sources. This is OK, but don't be afraid to make changes and experiment with the code to see what happens. Here are a few points:

      • You do not need the while loop, which incidently is not completed anyway. In this context while(<READFILE>) is used to read the file line by line, but since you previously slurped the file, all of it has been read into the $filecontents variable. So a while loop is not needed here.
         
      • Incrementing a hash as suggested by apl: $CharCount{$characters}++ Only works for this inside a loop structure. You are trying to combine two different techniques.
         
      • You will need to experiment with your regexes to get the matches you want. If you have a copy of Programming Perl Study the chapter on Pattern Matching, and/or check the Tutorials section of Perlmonks. Also, see annotated code below:
      # this matches each character once $filecontents =~ m/./g; # this version also puts a copy of each match into @characters my @characters = $filecontents =~ m/./g; # this counts the number of elements in @characters my $CharCount = scalar @characters; # to verify your regex is matching correctly # this will print a list of each item counted: for (@characters){ print "$_ \n"; }

      Good luck with your assignment.

        Your code looks like it has just been cobbled together with snippets from various sources. This is OK, but don't be afraid to make changes and experiment with the code to see what happens.

        I personally believe it's far from being OK unless one

        • understands what that those snippets do, or tries to and upon failure asks for clarification, or
        • can safely take a black-box approach: e.g. I can't understand a given sub's algorithm, but I have a clear idea of what input it takes and what output does it spit out. (Or try to understand that and upon failure...)
        --
        If you can't understand the incipit, then please check the IPB Campaign.