A couple of critiques of your posted code:
- You use warnings but not strict; is there a reason?
- Your open test doesn't do what you think. The C style Logical Or (||) is higher precedence than the Comma Operator, so as long as your file path is not logically false, it is a null op. In addition, it's inside parentheses. The smallest change that will yield code that functions as you likely expect is
open (FILE, '<insertfilepath>') || die $!;
though I personally would use something closer to
open (my $fh, '<', '<insertfilepath>') or die "Open failed : $!";
undef($/);
while (<$fh>) {
See perlopentut.
- The default behavior for split with no arguments will do what you intend: it splits $_ on one or more consecutive whitespace characters. Your expression likely does not do what you intend for Hello. How are you? since it creates an empty entry for the double space after the period. I'd swap the line to:
my @array = split;
or at least
my @array = split(/\s+/,$_);
- You never use a scalar named $word but you declare one - another no-op. You likely mean my %word;. See Perl variable types in perlintro.
- Rather than try and define every possible non-word character, you should use character classes. So replace s/[\,|\.|\!|\?|\:|\;|\"|\'|\<|\>]//g; with s/\W//g. This is not literally identical, but if you are just using English language sources w/o mathematical formulas you are pretty well safe. See perlretut.
- You don't account for variations in capitalization - I suspect this is the bug you are encountering. You should lowercase the result to compensate, either with $_ = lc; or tr/A-Z/a-z/;
- You also have a scoping issue with overwriting @array that you avoided through luck because you slurp the file and don't enforce strict.
With all these changes, your code might look like:
#!/usr/bin/perl
use strict;
use warnings;
open (my $fh, '<', '<insertfilepath>') or die "Open failed : $!";
undef($/);
my %word;
while (<$fh>) {
my @array = split(/\s+/, $_);
foreach (@array) {
print "$_\n";
}
for (@array){
s/\W//g;
tr/A-Z/a-z/;
$word{$_}++;
}
}
for (sort(keys %word)) {
print "$_ occurred $word{$_} times\n";
}