This script counts the number of times a letter comes at the end of a word in a file. The inspiration for this script comes from ambrus's CUFP 779860, Final letter frequency (for french)

Update: removed "use utf8;" (which was not doing what I thought)

#!/usr/bin/perl -w use strict; use warnings; my %wc; while (<>) { while (/(\p{IsAlpha}+)/g) { my $word = $1; my $last = substr $word,-1; print "\ncouneting [$last] for word [$word] "; $wc{lc($last)}++; } } for my $l (sort { $wc{$a} <=> $wc{$b} } keys %wc) { printf "\n%5d %s", $wc{$l}, $l; } print "\n";

This script can be run like this.

./wordcount.pl < ~/dump/27827-8.txt
or
 cat ~/dump/advsh12.txt  | ./wordcount.pl 
. This is what I see when I run it for "The Kama Sutra of Vatsyayana by Vatsyayana" http://www.gutenberg.org/etext/27827
    1 q
    1 ###    1 ###    2 ###    2 ###    2 ###    3 j
    4 ##### 29 v
   33 b
   33 x
   50 z
  116 u
  154 c
  190 p
  249 i
  299 w
  378 k
 1208 m
 1231 l
 1848 h
 2232 a
 2347 g
 2782 o
 3196 y
 3510 f
 4300 t
 4557 r
 5713 n
 6324 d
 8490 s
12606 e
This is what I see for "The Adventures of Sherlock Holmes by Sir Arthur Conan Doyle" http://www.gutenberg.org/etext/1661
    1 j
    1 ###    2 q
    4 v
    6 ###   17 z
   79 x
   90 b
  139 c
  700 p
 1290 w
 1455 k
 1581 u
 1922 m
 2597 l
 2911 a
 2952 h
 3034 g
 3064 i
 3467 f
 5038 o
 6309 y
 6743 r
 8317 n
11335 s
11807 t
12068 d
21277 e

Replies are listed 'Best First'.
Re: Final letter frequency -- for english
by JavaFan (Canon) on Dec 28, 2009 at 13:02 UTC
    Why does your program contain use utf8;? Its manual pages says:
    Do not use this pragma for anything else than telling Perl that your script is written in UTF-8.
    Boldface is not mine, the manual page uses bold. Probably because it's important.
      My bad for not reading the manual. I *assumed* it was telling the script to handle the input as utf-8. Thanks JavaFan