Counting words

bisimen has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Counting words by toolic (Bishop) on Nov 04, 2017 at 17:49 UTC
Your array only has a single element. I'll show how you would split each big string into smaller strings of 2 letters each using a regular expression. Each 2-letter word is stored in a hash. `use warnings; use strict; my $length = 2; my $str = "BEBEBEHUHUHUJJFAFALL"; my %cnt; while ($str =~ /(.{$length})/g) { $cnt{$1}++; } print "Found:\n"; print "$_ $cnt{$_}\n" for sort keys %cnt; __END__ Found: BE 3 FA 2 HU 3 JJ 1 LL 1` [download]	[reply] [d/l]
Re^2: Counting words by bisimen (Acolyte) on Nov 04, 2017 at 18:09 UTC
This works! Bit confused about how tho... But, thanks lad.	[reply]
Re^3: Counting words by Laurent_R (Canon) on Nov 05, 2017 at 00:17 UTC
Hi bisimen, the solution suggested by toolic uses regular expressions to cut the string into segments of `$length` (2, in this case) letters. Regular expressions are a very powerful feature of Perl that you really need to learn at some point. However, assuming you don't know regular expressions yet, this is another way you could do it, which might be easier for you to understand: `my $str = "BEBEBEHUHUHUJJFAFALL"; my $length = 2; my $index = 0; my %cnt; # hash to store the counters while (1) { # infinite loop my $substring = substr $str, $index, $length; # getting a subst +ring of $length length, starting at offet $index (initially 0) last if length($substring) < $length; # exiting the inf +inite loop if we are at the end of the string $cnt{$substring}++; # increasing the +counter for the substring $index += $length; # increasing the +offset by $length }` [download] This creates the following counters in the %cnt hash: `'BE' => 3 'FA' => 2 'HU' => 3 'JJ' => 1 'LL' => 1` [download] Note that this is not the way I would do it, but it is hopefully easier to understand for you, and one of Perl's favorite mottoes is: TIMTOWTDI, i.e. there is more than one way to do it. Update: Using unpack would most probably be more efficient. Here I only wanted to show a possible process step by step.	[reply] [d/l] [select]
Re: Counting words by johngg (Canon) on Nov 05, 2017 at 11:40 UTC
As long as you are not allowing overlaps, an alternative to toolic's regex solution would be to use unpack. `johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' my $str = q{BEBEBEHUHUHUJJFAFALL}; my %cnt; $cnt{ $_ } ++ for unpack q{(a2)}, $str; say qq{$_ -> $cnt{ $_ }} for sort keys %cnt;' BE -> 3 FA -> 2 HU -> 3 JJ -> 1 LL -> 1` [download] I hope this is of interest. Update:* A bit of a lash-up allowing for overlaps. Read more... (2 kB) Cheers, JohnGG	[reply] [d/l] [select]
Re: Counting words by davido (Cardinal) on Nov 04, 2017 at 18:39 UTC
Is it intentional that you don't accommodate overlaps? In other words, why is "BE" (starting at offset 0) a repeated word, but "EB" (starting at offset 1) not? Just want to make sure that's not an overlooked concern. Dave	[reply]
Re: Counting words by 1nickt (Canon) on Nov 04, 2017 at 18:27 UTC
Hi, welcome, "I'm lost and I don't know what to search for"" "This works! ... Bit confused about how tho" Remain not in ignorance! perlrequick: "the very basics" perlretut: "a basic tutorial" perlre: "the syntax" (Also ... it's not "a code" -- it's a program, or alternatively a script, written in Perl (could be any computer language). The program in its written state is referred to as the "source code" or "the code.") The way forward always starts with a minimal test.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.

Counting words

Remain not in ignorance!