Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Frequency occurence of same words in a file

by geek12 (Novice)
on Mar 24, 2022 at 22:04 UTC ( [id://11142391]=perlquestion: print w/replies, xml ) Need Help??

geek12 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
Input file: 0011 Sally 1122 Brandon 2233 George 0011 Roy 1122 Simson
Output: 0011 ---> 2 Sally Roy 1122 ---> 2 Brandon Simson 2233 ---> 1 George.
I want the output as mentioned above where words occurring on the left column is counted on the frequency of words listed and also printing respective right hand words mapping to left hand side coulmn words. I have the following code:
my %count; my $file = "Input file"; open my $fh, '<', $file or die "Could not open '$file' $!"; while (my $line = <$fh>) { foreach my $str ($line) { $count{$str}++; } } foreach my $str (sort keys %count) { printf "%-31s %s\n", $str, $count{$str}; }
But, the above code is not working with the right hand side coulmn inclusion. Can you please help me with it? Thank you.

Replies are listed 'Best First'.
Re: Frequency occurence of same words in a file
by hippo (Bishop) on Mar 24, 2022 at 22:32 UTC

    You need to split each input line or otherwise extract the 2 items from it. Here is an SSCCE:

    #!/usr/bin/env perl use strict; use warnings; my %count; while (my $line = <DATA>) { my ($key, $field) = split / +/, $line; push @{$count{$key}}, $field; } for my $key (sort keys %count) { print "$key ---> " . scalar @{$count{$key}} . "\n"; print $_ for @{$count{$key}}; print "\n"; } __DATA__ 0011 Sally 1122 Brandon 2233 George 0011 Roy 1122 Simson

    🦛

      Hi, This is not giving the output as I want. Output I want is like below:
      0011 ------> 2 Sally Roy 1122 ------> 2 Brandon Simson 2233 -------->2 George
        Sorry, I tried printing just the below:
        print "@{$count{$key}}\n";
        and it gives what I am looking for. thank you for your help.
Re: Frequency occurence of same words in a file
by davido (Cardinal) on Mar 25, 2022 at 06:15 UTC

    Would 0011 Sally ever appear more than once? And if so, would the output contain two listings of Sally, or one?

    0011 Sally 0011 Sally 0011 Sally # This? ------------- 0011 ---> 3 Sally Sally Sally # Or this? ---------- 0011 ---> 3 Sally

    So the first question is what to do with collisions of the same name within the same prefix.

    Also, what is the min and max for the numeric range? How many distinct numeric prefixes are there, and does the maximum prefix change over time?

    I'm just wondering, as the solution may be different if we know more about the input data.


    Dave

      Hi, 0011 Sally will be the only occurrence. Only coulmn on left can be multiple times but the right column will be unique. I was able to get the first solution working. thank you for the help.
Re: Frequency occurence of same words in a file -- oneliner
by Discipulus (Canon) on Mar 25, 2022 at 08:28 UTC
    Hello geek12 and welcome to the monstery and to the wonderful world of perl!

    as you already get sane and wise answers, I propose you a oneliner solution (change " to ' around the code to run it on Linux):

    perl -lane "push @{$H{$F[0]}},$F[1]}{print map{$_,qq( ---> ),scalar @{ +$H{$_}},$/,(join $/,@{$H{$_}}),$/,$/}keys %H" data1.txt 0011 ---> 2 Sally Roy 1122 ---> 2 Brandon Simson 2233 ---> 1 George

    You can use -MO=Deparse to have some clue on how to read it:

    perl -MO=Deparse -lane "push @{$H{$F[0]}},$F[1]}{print map{ $_,qq( --- +> ),scalar @{$H{$_}},$/,(join $/,@{$H{$_}}),$/,$/ }keys %H" BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = readline ARGV)) { chomp $_; our @F = split(' ', $_, 0); push @{$H{$F[0]};}, $F[1]; } { print map({$_, ' ---> ', scalar @{$H{$_};}, $/, join($/, @{$H{$_}; +}), $/, $/;} keys %H); } -e syntax OK

    See perlrun for perl command line switches, but basically -l takes care of line ending (you are not chomp -ing lines!), -a does autosplit populating the @F array and -n wraps the code in a while loop without printing ( and -p does the same but printing). Braces in the part ..$F[1]}{print.. are a trick named Eskimo Greeting.

    See perlvar for $/ and @F

    You can explore these switches deparsing them one at time like in ( -e is for execute perl code, and -e 1 is just a null program):

    perl -MO=Deparse -l -e 1 BEGIN { $/ = "\n"; $\ = "\n"; } '???'; -e syntax OK perl -MO=Deparse -a -e 1 LINE: while (defined($_ = readline ARGV)) { our @F = split(' ', $_, 0); '???'; } -e syntax OK perl -MO=Deparse -n -e 1 LINE: while (defined($_ = readline ARGV)) { '???'; } -e syntax OK perl -MO=Deparse -p -e 1 LINE: while (defined($_ = readline ARGV)) { '???'; } continue { die "-p destination: $!\n" unless print $_; } -e syntax OK
    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Hi, Will try this out. The first solution worked for me. I will go through the code you suggested as well and try it out and understand. thank you for the help.
Re: Frequency occurence of same words in a file
by LanX (Saint) on Mar 24, 2022 at 22:12 UTC
    Hi

    Welcome to the monastery! :)

    Could you please correct the <code> tags around your input and source?

    There is an Edit button just for that.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      done. thank you :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11142391]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2024-04-24 16:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found