I hope you don't mind deviating from the "one-liner" requirement. You mentioned you're doing this to learn something along the way, so I wanted to make another suggestion, to that end.

Your current implementation must sort the entire city list for each country, just to retrieve the top four items. When you need the top-n of anything, in sorted order, it's rather unfortunate that the simplest approach is usually to sort the entire list. What you could get away with using is a "partial sort"; one that partitions the input into two parts: a part you want, and a part you don't want. ...and then sorts and returns just the part you want.

It turns out there's a module on CPAN that does this. It's called, Sort::Key::Top. Its interface is a little complicated to learn at first, but once you do, it works fairly well. Here is an example:

use Sort::Key::Top 'rnkeytopsort'; my %countries; while( <DATA> ) { my( $country ) = m/:([^:]{2}):/; push @{$countries{$country}}, $_; } print map { rnkeytopsort { /^(\d+):/; $1; } 4 => @{$countries{$_}} } keys %countries; __DATA__ 20470:ZM:Samfya:Africa 20149:ZM:Sesheke:Africa 18638:ZM:Siavonga:Africa 26459:ZW:Beitbridge:Africa 37423:ZW:Bindura:Africa 699385:ZW:Bulawayo:Africa 47294:ZW:Chegutu:Africa 61739:ZW:Chinhoyi:Africa 18860:ZW:Chipinge:Africa 28205:ZW:Chiredzi:Africa

The way this works is it takes your original data set, and divides it into smaller sets, each set representing a country. Then it does a "top-n" partial sort within each country, and prints out the result.

I first went looking for a module like this one awhile ago, after using C++'s std::partition and std::partial_sort algorithms in a C++ project I was working on at the time. The concepts are pretty simple, but sometimes it takes seeing them in use somewhere else (in this case in a different language) to "discover" their usefulness.

Update:

After preaching about the wasted cycles caused by sorting the entire list of cities just to pick the top four, I went ahead and implemented a version that does just that. Why? It was one of those times where after walking away from the keyboard an idea came along that seemed like it would be fun to explore. Here it is:

print do { my($c,$n) = ('',0); map { $_->[0] } grep { ($c,$n) = ($_->[2],0) if $_->[2] ne $c; $n++ < 4 } sort { $a->[2] cmp $b->[2] || $b->[1] <=> $a->[1] } map { [ $_, /^(\d+):([^:]{2}):/ ] } <DATA>; }; __DATA__ 20470:ZM:Samfya:Africa 20149:ZM:Sesheke:Africa 18638:ZM:Siavonga:Africa 26459:ZW:Beitbridge:Africa 37423:ZW:Bindura:Africa 699385:ZW:Bulawayo:Africa 47294:ZW:Chegutu:Africa 61739:ZW:Chinhoyi:Africa 18860:ZW:Chipinge:Africa 28205:ZW:Chiredzi:Africa

Read this one from the bottom up:

  1. Create an anonymous array for each line in the input file. The first element is the line itself, followed by the population, and finally the country code. This is shaping up to look like a Schwartzian Transform.
  2. Sort based on two criteria; first, the country code, and second, the population. The result will be that all cities within a given country are grouped together, in descending order by population. And all countries will be in ascending order by country code. ...still a typical Schwartzian Transform, with a compound sort key.
  3. Grep the sorted list, keeping only the first four cities for each country. By keeping track of the last country seen, and running a counter that we increment on each iteration, but reset whenever a new country code is spotted, we can identify when we've reached the maximum wanted per country. ...this is a deviation from the basic Schwartzian Transform.
  4. Drop the computed keys, and keep only the original lines that survived the 'grep' filter... in sorted order.
  5. The do{...} block just creates a nice compact lexical scope with a return value (the list resulting from the outer map, that we feed into print. I like this because it means the lexical variables I declare are very narrowly scoped.
  6. Print the result.

It seemed like a cool approach to me, even if it gives back a little efficiency by sorting the entire list. I would probably favor the partition/partial sort strategy posted at the top of my answer though; it's fairly clear what it does, and should be efficient.


Dave


In reply to Re: Using map function to print few elements of list returned by sort function by davido
in thread Using map function to print few elements of list returned by sort function by jaypal

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.