in reply to Re: Statistics from a txtfile
in thread Statistics from a txtfile

Dear Sirs, I have knuckled down and made some good ground work with the structure. I have managed to accept a given file name if certain values are met. I Have managed also to count the sentences but can i add the character count, paragraph count and word count from within the same WHILE loop and I am confused as to count paragraphs. Is there a code for carriage returns?
#!/usr/bin/perl if ($#ARGV == -1) { print("Please enter filename: "); $filename = <STDIN>; chomp ($filename); } else { $filename = $ARGV[0]; } if ($filename -r && $filename=~m/^[_a-zA-Z]/) #if filename is readable + AND matches..... { open (READFILE, $filename)|| die "Failed to open $filename: $!"; } if ($filename !~ m/\.txt$/i) #if filename does not end with .txt then +add to filename { $filename .= ".TXT"; } my $filecontents; { local undef $/; $filecontents = <READFILE>; } close <READFILE>; #slurps the whole file into a variable $sentences = 0; my @characters = $filecontents =~ m/./g; # puts a copy of each match + into @characters my($ch); my $CharCount = scalar @characters; # this counts the number of + elements in @characters my @words = $filecontents =~m/[a-zA-Z]\s/g;# matches a char followed b +y a white space character globally while ($ch = getc(READFILE)) { # count sentences: if ($ch eq "?" || $ch eq "!" || $ch eq ".") # if character is one of the three end of sentence markers { $sentences++; } } while ($ch = getc(READFILE)) { $CharCount { $ch }++; } for (@characters) { print "There are $_ \n characters"; } print "There are $sentences sentences"; + # output data # open(OUT, ">data1.txt") || die "data1.txt not open: $!"; # close(OUT);

Replies are listed 'Best First'.
Re^3: Statistics from a txtfile
by ww (Archbishop) on Jan 07, 2008 at 22:37 UTC

    Re your question, "Is there a code for carriage returns?"

    Yes, \n,

    That's pretty basic but...

    1. Whether or how a carriage return defines a paragraph, in a grammatical sense is another question. Some definitions of a paragraph construe the combination of a period followed by a carriage return and a second <CR> in the next, otherwise empty line as a paragraph indicator. But others might consider any line beginning with indentation (eg, leading space(s) or tab) greater than that of the previous line as a paragraph indicator... and if you wish to stretch a bit, some plain text might invite the interpretation that any <CR> marks a paragraph end. How are you defining a paragraph?
       
    2. Similarly, your test for sentences, if ($ch eq "?" || $ch eq "!" || $ch eq ".") is incomplete because it fails to allow for the possibility that the sentence may contain an abbreviated word or words:
      "Mr. John Doe, Jr. is a Sr. programmer for E.H.I., Inc. Miss Laura J. Smith is a Analyst for ABC. "
      How many sentences are there? By inspection, I'm sure you'll agree, there are two. But your test for sentences will give you a much higher sentence count.
       

    As to the rest of your logic and syntax: Note that your code won't compile (running perl -c yourcode is a good idea before posting :-) as is (if this is the problem) double-checking that what you've posted matches what your thouight you posted. Even with all the syntactical issues fixed, I can't make your code extract the values you assert you've obtained.

    Regretably, I've run out of time and ambition to identify/clarify/correct all of those, but they're issues of higher precedence than your hope to do all the counting in a single while clause. At this point, I have to suspect that producing this relied as much on cutting and pasting snippets from hither and yon, as on study and comprehension. Note however, that among other things, you're trying to use getc on a filehandle that isn't open; that the use you would be making of getc if the filehandle were open would be redundant (you've already read all the chars; why not read them from $filecontents?) and .../me trails off in dismay....)

    Perhaps you'll get some inspiration from the Perl Cookbook (for instance chapter 8, and 8.2 especially).