tdevil76 has asked for the wisdom of the Perl Monks concerning the following question:

My first post, so please excuse any sins I may commit ....

I have an array whose values consist of comma separated lists of 3 items(country,city,state).
I want to split each of these values, and put the results into 3 separate arrays (@countries,@cities,@states), and then make the values in those arrays unique.

Here is my (obvious) solution:
for my $location (@locations){ next if ($location eq 'HEADING'); my ($country,$state,$city) = split /,/, $location; push @countries, $country; push @states, $state; push @cities, $city; } @countries = uniq @countries; @states = uniq @states; @cities = uniq @cities;


I'm looking for a cleaner, more readable approach to the task. Thanks!

Replies are listed 'Best First'.
Re: cleaner code ....
by ikegami (Patriarch) on Dec 06, 2006 at 23:25 UTC

    It looks clean to me.

    uniq could be replaced with a hash if you don't care about order, but one could argue that would make it less clean.

    for my $location (@locations) { next if $location eq 'HEADING'; my ($country, $state, $city) = split /,/, $location; ++$countries{$country}; ++$states{$state}; ++$cities{$city}; } @countries = keys %countries; @states = keys %states; @cities = keys %cities;
Re: cleaner code ....
by Joost (Canon) on Dec 06, 2006 at 23:31 UTC
    ikegami's right: that's about as clean as you'll get.

    If you want to preserve order, I'd use a hash lookup to determine wether to add the elements instead of using unique; that should be a tiny bit faster but not as good looking. If you don't care about the order, ikegami's solution is better.

    my (%country_seen,%state_seen,%city_seen); for my $location (@locations){ next if ($location eq 'HEADING'); my ($country,$state,$city) = split /,/, $location; push @countries, $country unless $country_seen{$country}++; push @states, $state unless $state_seen{$state}++; push @cities, $city unless $city_seen{$city}++; }
Re: cleaner code ....
by jdporter (Paladin) on Dec 06, 2006 at 23:51 UTC

    Maybe a little "cleaner", but probably seems obfuscated for the noobier classes of perl slinger:

    my @hashes = \( my %countries, my %states, my %cities, ); for ( grep !/HEADING/, @locations ) { my @v = split /,/; $hashes[$_]{$v[$_]}++ for 0 .. $#v; } my @countries = sort keys %countries; my @states = sort keys %states; my @cities = sort keys %cities;
    We're building the house of the future together.
Re: cleaner code ....
by Fletch (Bishop) on Dec 07, 2006 at 00:00 UTC

    Not directly germane to cleanliness, but you also should be aware that you might want to look at a module such as Text::CSV_XS rather than just split at some point; you may be parsing "just comma separated" now, but down the road some manager's going to want to mungle this in his "database" (aka Excel) and you'll be glad you know where to look.

Re: cleaner code ....
by throop (Chaplain) on Dec 07, 2006 at 04:35 UTC
    Yes, what you have is already pretty clean. So here's some thoughts and suggestions, anyways.

    I suggest you take this code and shove it down into a subroutine. See Ovid's post Linear programming is bad for why — mainly that it allows you (and maintainers who follow you) to see the toplevel flow without sweating the details. You'll have a decision to make. You can either implement a general &split_record_into_arrays function or a more specific &split_into_city_county_state. The former is cleaner and more general. The latter gives more opportunity for validating your data. E.g., making sure that every state really is a legal state.

    Perhaps this is just a class assignment and the point is to learn how to manipulate data structures. But if this is a real app, why to you want to apply 'uniq' to a list of cities? Why would you want to have just one city-entry for Springfield, Maine and Springfield, Ohio? And then why wouldn't you want to combine Akron and AKRON?. Right now, if you have a record with only two fields, you put an undef in the last list; is that OK? And you don't detect records with four fields.

    throop

Re: cleaner code ....
by jwkrahn (Abbot) on Dec 07, 2006 at 11:01 UTC
    The simple answer is to use a hash:
    my %unique; for my $location ( @locations ) { next if $location eq 'HEADING'; unless ( $unique{ $location }++ ) { my ( $country, $state, $city ) = split /,/, $location; push @countries, $country; push @states, $state; push @cities, $city; } }
Re: cleaner code ....
by GammaRay Rob (Friar) on Dec 07, 2006 at 15:53 UTC
    OK, slightly off-topic, but:
    Where do you find 'uniq' in perl?
    I have found an implementation in CPAN, but thought it would be a nice-to-have within perl itself!

    - GRR

      uniq is in List::MoreUtils. I think, like the functions in List::MoreUtils and List::Util, it's just (to quote the docs) "List::Util contains a selection of subroutines that people have expressed would be nice to have in the perl core, but the usage would not really be high enough to warrant the use of a keyword, and the size so small such that being individual extensions would be wasteful."

      emc

      At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

      —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
Re: cleaner code ....
by tdevil76 (Beadle) on Dec 07, 2006 at 22:40 UTC
    Thanks all for your input/suggestions...