A couple of critiques of your posted code:
- You use warnings but not strict; is there a reason?
- Your open test doesn't do what you think. The C style Logical Or (||) is higher precedence than the Comma Operator, so as long as your file path is not logically false, it is a null op. In addition, it's inside parentheses. The smallest change that will yield code that functions as you likely expect is
open (FILE, '<insertfilepath>') || die $!;
though I personally would use something closer to
open (my $fh, '<', '<insertfilepath>') or die "Open failed : $!";
undef($/);
while (<$fh>) {
See perlopentut.
- The default behavior for split with no arguments will do what you intend: it splits $_ on one or more consecutive whitespace characters. Your expression likely does not do what you intend for Hello. How are you? since it creates an empty entry for the double space after the period. I'd swap the line to:
my @array = split;
or at least
my @array = split(/\s+/,$_);
- You never use a scalar named $word but you declare one - another no-op. You likely mean my %word;. See Perl variable types in perlintro.
- Rather than try and define every possible non-word character, you should use character classes. So replace s/[\,|\.|\!|\?|\:|\;|\"|\'|\<|\>]//g; with s/\W//g. This is not literally identical, but if you are just using English language sources w/o mathematical formulas you are pretty well safe. See perlretut.
- You don't account for variations in capitalization - I suspect this is the bug you are encountering. You should lowercase the result to compensate, either with $_ = lc; or tr/A-Z/a-z/;
- You also have a scoping issue with overwriting @array that you avoided through luck because you slurp the file and don't enforce strict.
With all these changes, your code might look like:
#!/usr/bin/perl
use strict;
use warnings;
open (my $fh, '<', '<insertfilepath>') or die "Open failed : $!";
undef($/);
my %word;
while (<$fh>) {
my @array = split(/\s+/, $_);
foreach (@array) {
print "$_\n";
}
for (@array){
s/\W//g;
tr/A-Z/a-z/;
$word{$_}++;
}
}
for (sort(keys %word)) {
print "$_ occurred $word{$_} times\n";
}
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.