jaypal has asked for the wisdom of the Perl Monks concerning the following question:
Hello Perl Monks, I have a following data set where I would like to print top 4 populated cities for each country. The data is not sorted so I'll have to sort the data first and then grab the top 4 populated cities for each country. The first column is population, second column is country, third column is city and forth being the continent.
20470:ZM:Samfya:Africa 20149:ZM:Sesheke:Africa 18638:ZM:Siavonga:Africa 26459:ZW:Beitbridge:Africa 37423:ZW:Bindura:Africa 699385:ZW:Bulawayo:Africa 47294:ZW:Chegutu:Africa 61739:ZW:Chinhoyi:Africa 18860:ZW:Chipinge:Africa 28205:ZW:Chiredzi:Africa
So my output from the above data set would be:
20470:ZM:Samfya:Africa 20149:ZM:Sesheke:Africa 18638:ZM:Siavonga:Africa 699385:ZW:Bulawayo:Africa 61739:ZW:Chinhoyi:Africa 47294:ZW:Chegutu:Africa 37423:ZW:Bindura:Africa
I was able to write a perl script to get me my desired output. Here is my attempt at the perl script.
#!/usr/local/bin/perl use strict; use warnings; use Data::Dumper; my (%HoA, %lines); while (my $line = <DATA>) { my ($value, $key) = split /:/, $line, 3; push @{$HoA{$key}}, $value; $lines{"$key $value"} = $line # This could be done better } for my $country (keys %HoA) { my @list = sort { $b <=> $a } @{$HoA{$country}}; # This could be +done better for my $ind (0 .. 3) { # This could be done better my $popu = $list[$ind] or next; print $lines{"$country $popu"}; } } __DATA__ 20470:ZM:Samfya:Africa 20149:ZM:Sesheke:Africa 18638:ZM:Siavonga:Africa 26459:ZW:Beitbridge:Africa 37423:ZW:Bindura:Africa 699385:ZW:Bulawayo:Africa 47294:ZW:Chegutu:Africa 61739:ZW:Chinhoyi:Africa 18860:ZW:Chipinge:Africa 28205:ZW:Chiredzi:Africa
My question is based on my attempt to try an write a one-liner equivalent of the above script.
perl -F":" -lane ' BEGIN { $"=":" } push @{$h{$F[1]}}, $F[0]; $line{$F[1],$F[0]} = "@F"; }{ for $k (keys %h) { print @$_ for map [ $line{$k,$_} ], sort { $b <=> $ +a } @{$h{$k}} }' file 20470:ZM:Samfya:Africa 20149:ZM:Sesheke:Africa 18638:ZM:Siavonga:Africa 699385:ZW:Bulawayo:Africa 61739:ZW:Chinhoyi:Africa 47294:ZW:Chegutu:Africa 37423:ZW:Bindura:Africa 28205:ZW:Chiredzi:Africa 26459:ZW:Beitbridge:Africa 18860:ZW:Chipinge:Africa
I am stuck at being able to print the just the top 4 entries using map function. The above just prints out entire file in sorted format.
I was hoping to get some advice from the monks here on both my perl script (for anything I could have done better) as well as solving the question on my one-liner attempt. I have added comments where I felt, I could have written it more idiomatically.
The above data was taken from a question posted on StackOverflow. I know one-liner are not the best way to approach problem but I am still learning perl and feel that writing one liners can help me clear concepts of perl functions. Also probably as I have written a lot of awk one-liners, I feel perl can do this as well.
Looking forward to your comments and suggestions.
Regards
Jaypal
Update: I was able to get the desired output using a splice.
perl -F":" -lane ' BEGIN { $"=":" } push @{$h{$F[1]}}, $F[0]; $line{$F[1],$F[0]} = "@F"; }{ for $k (keys %h) { print $line{$k,$_} for splice [sort { $b <=> $a } @ +{$h{$k}}] , 0, 4 }' file 20470:ZM:Samfya:Africa 20149:ZM:Sesheke:Africa 18638:ZM:Siavonga:Africa 699385:ZW:Bulawayo:Africa 61739:ZW:Chinhoyi:Africa 47294:ZW:Chegutu:Africa 37423:ZW:Bindura:Africa
However would appreciate if anyone can suggest a better approach.
|
|---|