Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Counting words

by bisimen (Acolyte)
on Nov 04, 2017 at 17:23 UTC ( [id://1202752]=perlquestion: print w/replies, xml ) Need Help??

bisimen has asked for the wisdom of the Perl Monks concerning the following question:

Say I have an array with a string of random characters

$length = 2; @array = "BEBEBEHUHUHUJJFAFALL";

Then I basically want a code that counts words. It can be words with a length of 2, 3 etc.

So, for the above code, I look for words with a 2 character length, and I'd like to get an output like this

Found: BE 3 HU 3 JJ 1 FA 2 LL 2

I know you can use hashes, and they have the words be the key or something, then count from there... but I'm lost and I don't know what to search for...

Replies are listed 'Best First'.
Re: Counting words
by toolic (Bishop) on Nov 04, 2017 at 17:49 UTC
    Your array only has a single element. I'll show how you would split each big string into smaller strings of 2 letters each using a regular expression. Each 2-letter word is stored in a hash.
    use warnings; use strict; my $length = 2; my $str = "BEBEBEHUHUHUJJFAFALL"; my %cnt; while ($str =~ /(.{$length})/g) { $cnt{$1}++; } print "Found:\n"; print "$_ $cnt{$_}\n" for sort keys %cnt; __END__ Found: BE 3 FA 2 HU 3 JJ 1 LL 1

      This works!

      Bit confused about how tho... But, thanks lad.
        Hi bisimen,

        the solution suggested by toolic uses regular expressions to cut the string into segments of $length (2, in this case) letters. Regular expressions are a very powerful feature of Perl that you really need to learn at some point.

        However, assuming you don't know regular expressions yet, this is another way you could do it, which might be easier for you to understand:

        my $str = "BEBEBEHUHUHUJJFAFALL"; my $length = 2; my $index = 0; my %cnt; # hash to store the counters while (1) { # infinite loop my $substring = substr $str, $index, $length; # getting a subst +ring of $length length, starting at offet $index (initially 0) last if length($substring) < $length; # exiting the inf +inite loop if we are at the end of the string $cnt{$substring}++; # increasing the +counter for the substring $index += $length; # increasing the +offset by $length }
        This creates the following counters in the %cnt hash:
        'BE' => 3 'FA' => 2 'HU' => 3 'JJ' => 1 'LL' => 1
        Note that this is not the way I would do it, but it is hopefully easier to understand for you, and one of Perl's favorite mottoes is: TIMTOWTDI, i.e. there is more than one way to do it.

        Update: Using unpack would most probably be more efficient. Here I only wanted to show a possible process step by step.

Re: Counting words
by johngg (Canon) on Nov 05, 2017 at 11:40 UTC

    As long as you are not allowing overlaps, an alternative to toolic's regex solution would be to use unpack.

    johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' my $str = q{BEBEBEHUHUHUJJFAFALL}; my %cnt; $cnt{ $_ } ++ for unpack q{(a2)*}, $str; say qq{$_ -> $cnt{ $_ }} for sort keys %cnt;' BE -> 3 FA -> 2 HU -> 3 JJ -> 1 LL -> 1

    I hope this is of interest.

    Update: A bit of a lash-up allowing for overlaps.

    Cheers,

    JohnGG

Re: Counting words
by davido (Cardinal) on Nov 04, 2017 at 18:39 UTC

    Is it intentional that you don't accommodate overlaps?

    In other words, why is "BE" (starting at offset 0) a repeated word, but "EB" (starting at offset 1) not? Just want to make sure that's not an overlooked concern.


    Dave

Re: Counting words
by 1nickt (Canon) on Nov 04, 2017 at 18:27 UTC

    Hi, welcome,

    "I'm lost and I don't know what to search for""

    "This works! ... Bit confused about how tho"

    Remain not in ignorance!

    1. perlrequick: "the very basics"
    2. perlretut: "a basic tutorial"
    3. perlre: "the syntax"

    (Also ... it's not "a code" -- it's a program, or alternatively a script, written in Perl (could be any computer language). The program in its written state is referred to as the "source code" or "the code.")


    The way forward always starts with a minimal test.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1202752]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-18 10:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found