Re: Statistics from a txtfile
by suaveant (Parson) on Dec 28, 2007 at 18:38 UTC
|
you probably want a regexp more like
/^[_a-zA-Z][^.]{1,7}\.txt$/
This matches an underscore or letter (the {1} is unnecessary, char classes always match one char only, which is why you need modifiers like * or + to make it do something different)
Then you can do
while(<READFILE>) {
# something here
}
which will go through the file line by line putting the data in $_
I would read up on chomp and split and regular expressions on how to work with the data in the file.
P.S. put <code> tags around your code to make it display properly in a post.
- Ant
- Some of my
best work - (1 2 3)
| [reply] [d/l] [select] |
|
|
Have read up on regex and still confused.Am i anywhere near correct with the variables?
#!/usr/bin/perl
print("Please enter filename: ");
$filename = <STDIN>;
chomp ($filename);
if ($filename=~m/^[_a-zA-Z][^.]{1,7}\.txt$/)
{
open (READFILE, $filename)|| die "Failed to open $filename: $!";
}
my $filecontents;
{
local undef $/;
$filecontents = <READFILE>;
}
close <READFILE>; #slurps the whole file into variable
my @characters = $filecontents =~ m/./g; # puts a copy of each match
+ into @characters
my $CharCount = scalar @characters; # this counts the number of
+ elements in @characters
my @words = $filecontents =~m/\b\s/g; # number of words
my @paragraph =($ # number of paragraphs. wha
+t is code for new line char ie carriage return?
my @sentences = $filecontents=~m/\.$/g; # number of sentences
for (@characters){
print "$_ \n";
} # this will print a list
+of each item counted:
# output data
# open(OUT, ">data1.txt") || die "data1.txt not open: $!";
# close(OUT);
| [reply] [d/l] |
Re: Statistics from a txtfile
by cdarke (Prior) on Dec 28, 2007 at 19:11 UTC
|
A few other things:
When you read the filename from STDIN it will include a trailing new-line character, so get ride of it with : chomp $filename;
When reporting an error from an open (or any other system related function) include the system error held in $!, so you know why it failed: open (READFILE, $filename)|| die "Failed to opne $filename: $!";
There is plenty more to do, but I suggest you get some basic code working,then embelish it. | [reply] [d/l] [select] |
Re: Statistics from a txtfile
by hangon (Deacon) on Dec 28, 2007 at 21:51 UTC
|
To get the statistics you want, it may be easier to slurp the whole file into a variable then process it through a series of regexes. Here's the basic idea, but as suaveant suggests - read up on regular expressions:
open (READFILE, $filename)|| die "Failed to open $filename: $!";
my $filecontents;
{
local undef $/;
$filecontents = <READFILE>;
}
close <READFILE>;
my @words = $filecontents =~ / ... /g;
my $wordcount = scalar @words;
my @characters = ...
my @sentences = ...
# etc
| [reply] [d/l] |
|
|
Hi There,
I have reviewed my code and have put it into some sort of structure.
It was to the problem that i had to:
ask for a filename
check file to see if its ms-dos file format
filename should be no longer than 8 characters and should begin with an _underscore or letter and should end with .txt not case sensitive
if not then it should add .txt
program should check whether file exists and not empty
should read the file by character and get the following statistics: character count including whitespace the punctuation. number of words. paragraphs. lines and sentences.
output details to a separate .txt file.
#!/usr/bin/perl
print("Please enter filename: ");
$filename = <STDIN>;
chomp ($filename);
if ($filename=~m/^[_a-zA-Z][^.]{1,7}\.txt$/)
open (READFILE, $filename)|| die "Failed to open $filename: $!";
my $filecontents;
{
local undef $/;
$filecontents = <READFILE>;
}
while (<READFILE>)
my @characters = ($filecontents =~m/\b/g);
my @words = ($filecontents =~m/\b\s/g);
my $wordcount = scalar @words;
my @paragraph =($
my @sentences = ($filecontents=~m/\.$/);
$CharCount{ $characters }++;
$wordcount{ $wordcount)++;
etc
close <READFILE>;
open(OUT, ">data1.txt") || die "data1.txt not open: $!";
output data here
close(OUT);
This is as far as I have got. Could you please elaborate on my coding further?
thankyou kindly | [reply] [d/l] |
|
|
# this matches each character once
$filecontents =~ m/./g;
# this version also puts a copy of each match into @characters
my @characters = $filecontents =~ m/./g;
# this counts the number of elements in @characters
my $CharCount = scalar @characters;
# to verify your regex is matching correctly
# this will print a list of each item counted:
for (@characters){
print "$_ \n";
}
Good luck with your assignment.
| [reply] [d/l] |
|
|
|
|
Re: Statistics from a txtfile
by apl (Monsignor) on Dec 28, 2007 at 20:35 UTC
|
character count including whitespace the punctuation
For each character in a line, increment a hash keyed on that character. (i.e. $CharCount{ $ch }++; )
number of words. paragraphs. lines and sentences.
How would you determine the end of a word, a paragraph, or a sentence? Increment the appropriate counter when you hit that situation.
How do you determine that you've read a line?
| [reply] [d/l] |
Re: Statistics from a txtfile
by ww (Archbishop) on Dec 28, 2007 at 22:43 UTC
|
smells like homework...
so, the nudge is:
- Read Learning Perl
- Read Perl Cookbook (stats answers can be found here)
- Mark homework as such when it is
| [reply] |
|
|
Dear Sirs,
I have knuckled down and made some good ground work with the structure. I have managed to accept a given file name if certain values are met. I Have managed also to count the sentences but can i add the character count, paragraph count and word count from within the same WHILE loop and I am confused as to count paragraphs. Is there a code for carriage returns?
#!/usr/bin/perl
if ($#ARGV == -1)
{
print("Please enter filename: ");
$filename = <STDIN>;
chomp ($filename);
}
else
{
$filename = $ARGV[0];
}
if ($filename -r && $filename=~m/^[_a-zA-Z]/) #if filename is readable
+ AND matches.....
{
open (READFILE, $filename)|| die "Failed to open $filename: $!";
}
if ($filename !~ m/\.txt$/i) #if filename does not end with .txt then
+add to filename
{
$filename .= ".TXT";
}
my $filecontents;
{
local undef $/;
$filecontents = <READFILE>;
}
close <READFILE>; #slurps the whole file into a variable
$sentences = 0;
my @characters = $filecontents =~ m/./g; # puts a copy of each match
+ into @characters
my($ch);
my $CharCount = scalar @characters; # this counts the number of
+ elements in @characters
my @words = $filecontents =~m/[a-zA-Z]\s/g;# matches a char followed b
+y a white space character globally
while ($ch = getc(READFILE))
{
# count sentences:
if ($ch eq "?" || $ch eq "!" || $ch eq ".")
# if character is one of the three end of sentence markers
{
$sentences++;
}
}
while ($ch = getc(READFILE))
{
$CharCount
{ $ch }++;
}
for (@characters)
{
print "There are $_ \n characters";
}
print "There are $sentences sentences";
+
# output data
# open(OUT, ">data1.txt") || die "data1.txt not open: $!";
# close(OUT);
| [reply] [d/l] |
|
|
Re your question, "Is there a code for carriage returns?"
Yes, \n,
That's pretty basic but...
- Whether or how a carriage return defines a paragraph, in a grammatical sense is another question. Some definitions of a paragraph construe the combination of a period followed by a carriage return and a second <CR> in the next, otherwise empty line as a paragraph indicator. But others might consider any line beginning with indentation (eg, leading space(s) or tab) greater than that of the previous line as a paragraph indicator... and if you wish to stretch a bit, some plain text might invite the interpretation that any <CR> marks a paragraph end. How are you defining a paragraph?
- Similarly, your test for sentences,
if ($ch eq "?" || $ch eq "!" || $ch eq ".")
is incomplete because it fails to allow for the possibility that the sentence may contain an abbreviated word or words:
"Mr. John Doe, Jr. is a Sr. programmer for E.H.I., Inc. Miss Laura J. Smith is a Analyst for ABC. "
How many sentences are there? By inspection, I'm sure you'll agree, there are two. But your test for sentences will give you a much higher sentence count.
As to the rest of your logic and syntax: Note that your code won't compile (running perl -c yourcode is a good idea before posting :-) as is (if this is the problem) double-checking that what you've posted matches what your thouight you posted. Even with all the syntactical issues fixed, I can't make your code extract the values you assert you've obtained.
Regretably, I've run out of time and ambition to identify/clarify/correct all of those, but they're issues of higher precedence than your hope to do all the counting in a single while clause. At this point, I have to suspect that producing this relied as much on cutting and pasting snippets from hither and yon, as on study and comprehension. Note however, that among other things, you're trying to use getc on a filehandle that isn't open; that the use you would be making of getc if the filehandle were open would be redundant (you've already read all the chars; why not read them from $filecontents?) and .../me trails off in dismay....)
Perhaps you'll get some inspiration from the Perl Cookbook (for instance chapter 8, and 8.2 especially).
| [reply] [d/l] [select] |