Re: How do I gather and output info about a file's contents?

aureum: I am very new to Perl and have been tasked to achieve the following using it.

Who told you this? Why Perl?

I have found solutions for every other scenario but this one

Ohh fine! we'll come to this later (I have some problems in piping both directions to a process from within an Apache2/mod_perl process, maybe we can talk about this ...

Open a text file after first checking it exists, otherwise 'die'.

Always start your program by these two lines:

 use strict;
 use warnings;
[download]

These will guide you through your development process. Then do a sequence of:

 my $fpath = '/somewhere/in';
 my $fname = 'input.dat';
 my $fn = "$fpath/$fname";
 
 die "$fn: $!" unless -f $fn;
[download]

which is what I'd consider "somehow idiomatic". The "test" is done by the -f
operator (this has been adviced in another post).

check the file and count all characters, including white space, words, sentences, paragraphs and lines used in the file.

First, you have to read the file in one stroke (for this specific problem).
It's called "file slurping", in Perl5 this is done in the lines of

 my $content;
 {
    open my $fh, '<', $fn or die "$fn: $!";
    local $/;
    $content = <$fh>;
 }
[download]

After this, your file is within the "$content" buffer. For counting
I'd recommend a hash (but you can use whatever variable you want).
Now we bump into the problem of how's your text "structured", what
do you consider a "word", a "sentence" etc. My first guess would
look like this:

 my %counts;
 $counts{CHARS} =  length $content;
 $counts{WORDS} =  scalar( () = $content =~ /\b/g     ) / 2;
 $counts{SENTN} =  scalar( () = $content =~ /\w\.\W/g );
 $counts{PARAS} =  scalar( () = $content =~ /\n/g     ) + 1;
[download]

The "CHARS" thing is trivial, its the length of the buffer
holding the characters. The other counts are resolved by regular
expressions in /g (catch all) mode in "list context", which is
provided by the friendly goatse operator: ... () = $content ...

Then once it has done that it's to simply print the these returns to the screen like; characters = 'x' words = 'x' and so on.

We can now loop over the "counter" hash and print out the results:

 for my $key (sort keys %counts) {
    print "$key ==> $counts{$key}\n"
 }
[download]

This will print the counts according to the rules given
by the regular expressions above.
(Hope that helps to keep you going)

Regards

mwa

Comment on Re: How do I gather and output info about a file's contents? Select or Download Code