Frequency occurence of same words in a file

geek12 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Frequency occurence of same words in a file by hippo (Bishop) on Mar 24, 2022 at 22:32 UTC
You need to split each input line or otherwise extract the 2 items from it. Here is an SSCCE: `#!/usr/bin/env perl use strict; use warnings; my %count; while (my $line = <DATA>) { my ($key, $field) = split / +/, $line; push @{$count{$key}}, $field; } for my $key (sort keys %count) { print "$key ---> " . scalar @{$count{$key}} . "\n"; print $_ for @{$count{$key}}; print "\n"; } __DATA__ 0011 Sally 1122 Brandon 2233 George 0011 Roy 1122 Simson` [download] 🦛	[reply] [d/l]
Re^2: Frequency occurence of same words in a file by geek12 (Novice) on Mar 24, 2022 at 22:52 UTC
Hi, This is not giving the output as I want. Output I want is like below: `0011 ------> 2 Sally Roy 1122 ------> 2 Brandon Simson 2233 -------->2 George` [download]	[reply] [d/l]
Re^3: Frequency occurence of same words in a file by geek12 (Novice) on Mar 24, 2022 at 22:56 UTC
Sorry, I tried printing just the below: `print "@{$count{$key}}\n";` [download] and it gives what I am looking for. thank you for your help.	[reply] [d/l]
Re: Frequency occurence of same words in a file by davido (Cardinal) on Mar 25, 2022 at 06:15 UTC
Would `0011 Sally` ever appear more than once? And if so, would the output contain two listings of Sally, or one? `0011 Sally 0011 Sally 0011 Sally # This? ------------- 0011 ---> 3 Sally Sally Sally # Or this? ---------- 0011 ---> 3 Sally` [download] So the first question is what to do with collisions of the same name within the same prefix. Also, what is the min and max for the numeric range? How many distinct numeric prefixes are there, and does the maximum prefix change over time? I'm just wondering, as the solution may be different if we know more about the input data. Dave	[reply] [d/l] [select]
Re^2: Frequency occurence of same words in a file by geek12 (Novice) on Mar 25, 2022 at 17:13 UTC
Hi, 0011 Sally will be the only occurrence. Only coulmn on left can be multiple times but the right column will be unique. I was able to get the first solution working. thank you for the help.	[reply]
Re: Frequency occurence of same words in a file -- oneliner by Discipulus (Canon) on Mar 25, 2022 at 08:28 UTC
Hello geek12 and welcome to the monstery and to the wonderful world of perl! as you already get sane and wise answers, I propose you a oneliner solution (change " to ' around the code to run it on Linux): `perl -lane "push @{$H{$F[0]}},$F[1]}{print map{$_,qq( ---> ),scalar @{ +$H{$_}},$/,(join $/,@{$H{$_}}),$/,$/}keys %H" data1.txt 0011 ---> 2 Sally Roy 1122 ---> 2 Brandon Simson 2233 ---> 1 George` [download] You can use `-MO=Deparse` to have some clue on how to read it: `perl -MO=Deparse -lane "push @{$H{$F[0]}},$F[1]}{print map{ $_,qq( --- +> ),scalar @{$H{$_}},$/,(join $/,@{$H{$_}}),$/,$/ }keys %H" BEGIN { $/ = "\n"; $\ = "\n"; } LINE: while (defined($_ = readline ARGV)) { chomp $_; our @F = split(' ', $_, 0); push @{$H{$F[0]};}, $F[1]; } { print map({$_, ' ---> ', scalar @{$H{$_};}, $/, join($/, @{$H{$_}; +}), $/, $/;} keys %H); } -e syntax OK` [download] See `perlrun` for perl command line switches, but basically `-l` takes care of line ending (you are not `chomp` -ing lines!), `-a` does autosplit populating the `@F` array and `-n` wraps the code in a while loop without printing ( and `-p` does the same but printing). Braces in the part `..$F[1]}{print..` are a trick named Eskimo Greeting. See `perlvar` for `$/` and `@F` You can explore these switches deparsing them one at time like in ( `-e` is for execute perl code, and `-e 1` is just a null program): `perl -MO=Deparse -l -e 1 BEGIN { $/ = "\n"; $\ = "\n"; } '???'; -e syntax OK perl -MO=Deparse -a -e 1 LINE: while (defined($_ = readline ARGV)) { our @F = split(' ', $_, 0); '???'; } -e syntax OK perl -MO=Deparse -n -e 1 LINE: while (defined($_ = readline ARGV)) { '???'; } -e syntax OK perl -MO=Deparse -p -e 1 LINE: while (defined($_ = readline ARGV)) { '???'; } continue { die "-p destination: $!\n" unless print $_; } -e syntax OK` [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2: Frequency occurence of same words in a file -- oneliner by geek12 (Novice) on Mar 25, 2022 at 17:15 UTC
Hi, Will try this out. The first solution worked for me. I will go through the code you suggested as well and try it out and understand. thank you for the help.	[reply]
Re: Frequency occurence of same words in a file by LanX (Saint) on Mar 24, 2022 at 22:12 UTC
Hi Welcome to the monastery! :) Could you please correct the <code> tags around your input and source? There is an Edit button just for that. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^2: Frequency occurence of same words in a file by geek12 (Novice) on Mar 24, 2022 at 22:15 UTC
done. thank you :)	[reply]


XP is just a number
	PerlMonks