Reason for Creation
In a current project we required a way of accessing data by state based
on regions and sub regions (referred to as divisions) as described
by the Census Bureau. I did a quick search on CPAN and didn't find
a regions specific module, I found several that were State based
however and leverage one for this module.
Initial Solution Requirements
- Region breakdown of states as outlined by the US Census
- Divisions within the Regions as outlined by the US Census
- Ability to return each of the above in a list, order not important
- Ability to find all states with in each of the above
- Have the state list returned in alphabetical order
- Have the state list returned in either "full name" or abbreviated form
( South Dakota -or- SD )
Module Code:
package Region;
=pod
Module purpose is to provide access to US regions, divisions, and stat
+es
grouped either by the above or individually. See inline
comments for more information.
=cut
use strict;
use Geography::States;
my $debug = 0;
my %regions;
my $region;
my $division;
my $gs = Geography::States->new('USA');
while (<DATA>) {
next if $_ !~ m/[a-zA-Z]/ || $_ =~ /^#/;
if (m/^[A-Z]/ && m/[A-Z]$/) {
$_ =~ s/\s+$//;
print $_ , "\n" if $debug;
$region = ucfirst(lc($_));
}
elsif (m/^[A-Z]/ && m/[a-z]$/) {
$_ =~ s/\s+$//;
print "\t$_\n" if $debug;
$division = $_;
}
elsif (m/^\s+\w/) {
$_ =~ s/^\s+|\s+$//g;
my $code = $gs->state($_);
push @{ $regions{$region}{$division} } , { full => $_ , code => $c
+ode } ;
}
}
sub new {
my $class = shift;
my $self = {};
$self = \%regions;
bless $self, $class;
}
=pod
The regions method will return a list of regions in alphabetical order
=cut
sub regions {
my $self = shift;
return sort keys %{ $self };
}
=pod
The divisions method will return a list of all of the divisions
within a region. It accepts a list of arguments, those items
must be equal to the region names. When a list is passed only
the divisions in the regions passed will be returned.
=cut
sub divisions {
my ($self,@reg) = @_;
my @list;
if (!$reg[0]) {
@reg = sort keys %{ $self };
}
@reg = map { ucfirst(lc($_)) } @reg;
foreach my $region (@reg) {
foreach my $division (sort keys %{ $self->{$region} }) {
push @list, $division;
}
}
return @list;
}
=pod
The state method will return an array of states, the contents of which
are determined by arguments passed to the method.
If no options (hash) is sent in then it will return a list of all the
state codes in alphabetical order.
The state name can be returned if key "name" has value of 'full'
States for a region can be returned if an option of 'region' has been
set to one of the available regions.
States for a division can be returned if an option of 'division' has b
+een
set to one of the available divisions.
The only mixing that can be done is State name type (full or code) alo
+ng
with division OR state. Sending both a region and division will only w
+ork
if the division selected is under the region selected.
=cut
sub state {
my ($self,%args) = @_;
my $verbiage = $args{name} || 'code';
my $region_ = lc($args{region}) || 'ALL';
my $division_ = lc($args{division}) || 'ALL';
my @list;
foreach my $region (keys %{ $self }) {
next if $region_ ne 'ALL' && lc($region) ne $region_;
foreach my $division (keys %{ $self->{$region} }) {
next if $division_ ne 'ALL' && lc($division) ne $division_;
foreach my $state ( @{ $self->{$region}{$division} } ) {
push @list , $state->{$verbiage};
}
}
}
return sort(@list);
}
__DATA__
NORTHEAST
Middle Atlantic
New Jersey
New York
Pennsylvania
New England
Connecticut
Maine
Massachusetts
New Hampshire
Rhode Island
Vermont
MIDWEST
East North Central
Illinois
Indiana
Michigan
Ohio
Wisconsin
West North Central
Iowa
Kansas
Minnesota
Missouri
Nebraska
North Dakota
South Dakota
SOUTH
East South Central
Alabama
Kentucky
Mississippi
Tennessee
South Atlantic
Delaware
District of Columbia
Florida
Georgia
Maryland
North Carolina
South Carolina
Virginia
West Virginia
West South Central
Arkansas
Louisiana
Oklahoma
Texas
WEST
Mountain
Arizona
Colorado
Idaho
Montana
Nevada
New Mexico
Utah
Wyoming
Pacific
Alaska
California
Hawaii
Oregon
Washington
#POSSESSIONS
#
# Puerto Rico
# Virgin Islands
# Pacific Islands
#
# Pacific Islands Includes: Canton, Guam, Mariana, Marshall, Samoa, Wa
+ke
Informal Test Code
#!/usr/bin/perl
use Region;
use strict;
my $regions = Region->new();
print join("\n",$regions->regions);
print "\n\n";
print join("\n",$regions->divisions);
print "\n\n";
print join("\n",$regions->divisions('west'));
print "\n\n";
print join("\n",$regions->state( name => 'full' , region => 'west' ) )
+;
print "\n\n";
print join("\n",$regions->state( name => 'full' , division => 'East No
+rth Central' ) );
print "\n\n";
print join("\n",$regions->state( name => 'code' , region => 'South' ,
+division => 'South Atlantic' ) );
print "\n\n";
Possible Module Names
Geography::US::Census::Regions
Geography::US::Regions::Census
Locale::US::Census::Regions
???
Interest/Comments
Is there any interest in this module for addition to CPAN or is there an existing
module that I overlooked that already fills this space?
General comments on design and method interfaces would be appreciated even
if you don't need or want the module.
UPDATED: Moved the while loop outside of the new to avoid issues if the user attempted to create multiple objects within a single script.
Removed the 'use Data::Dumper' that was left over from initial testing.
Re: RFC US Region Module
by Juerd (Abbot) on Jul 06, 2003 at 08:22 UTC
|
If you're putting this on CPAN, please fix the POD. When the POD is converted to any other format, the code in between is lost, and you have a big piece of paragraphs, that currently have no headings to make the structure clear.
Please read POD in 5 minutes and find out that POD is more than multi-line comments.
Juerd
# { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
| [reply] |
Re: RFC US Region Module
by Aristotle (Chancellor) on Jul 06, 2003 at 13:06 UTC
|
I don't see why you made this module object oriented - you're not making any use of $self. Also, you're reading DATA in your new() method - why? The second and subsequent instantiation of an "object" in your class will not have anything to initialize. Pull that while loop out of the method and get rid of new (as well as $self in the other functions). Of course the function names are somewhat unfortunately named for those who'd want to import them; maybe prefix them with us_ or some such.
As far as the name is concerned I'd definitely go with Geography::US::Census::Regions. Locale is the wrong namespace for this module. A module's "innermost" name portion should express what that module deals with; here, that is regions. It deals with those according to the census, hence Census::Regions. Regions::Census would mean it deals with the census according to the regions, which makes no sense.
Makeshifts last the longest.
| [reply] |
|
If I leave it OO don't I avoid the issue of the function name collision you mention? While I agree there is no current or compelling reason for this module to be OO it does seem to work easily enough to provide the functionality I need. Would I see benefits from it being non-OO?
The DATA read has been moved outside of the new since that was the wrong spot for it and subsequent object creations would have resulted in an empty hash ref being assigned to $self.
Of the names I felt the Geography namespace would be best, but I am still struggling with the Census::Regions / Regions::Census issue for these reasons:
- Geography::US::Regions would leave the name space open for additional Region separation specific modules each named for their source, as in this case Census
- Geography::US::Census seems limiting to me because the number of Geography based concepts that would deal with Census seem limited.
Thanks for the feedback.
| [reply] |
|
| [reply] |
|
| [reply] |
Re: RFC US Region Module
by PodMaster (Abbot) on Jul 06, 2003 at 13:27 UTC
|
In addition to what's already been said, I don't see you using any of Data::Dumper's functionality.
Also, I think you should just inline the datastructures you're generating in sub "new".
I see no point in reading from __DATA__ more than once if the data isn't changing.
MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" | I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). | ** The third rule of perl club is a statement of fact: pod is sexy. |
| [reply] |
|
The Data::Dumper was left over from initial debugging, I have removed it.
Agreed on the __DATA__ issue. In fact you can't reread from __DATA__ without certain precautions. I have moved the while outside of the new method to avoid this issue. I do however prefer to keep the data in plain text rather then a data structure for reablility. Thanks for the feedback.
| [reply] |
Re: RFC US Region Module
by Abigail-II (Bishop) on Jul 06, 2003 at 22:39 UTC
|
Interesting module, but, IMO, a sucky interface. Objects
aren't really useful, it's not that there are multiple
subdivisions of the USA. I strongly suggest a tied hash
(then your implementation can still benefit from an OO
approch), but it makes it easier for the user, specially
if (s)he wants to interpolate the results in a string.
Abigail | [reply] |
Re: RFC US Region Module
by chunlou (Curate) on Jul 06, 2003 at 19:45 UTC
|
A potential submodule could be Foo::Bar::GIS for exchanging data or even interfacing with other geo info systems, such as ArcInfo or GRASS. It would be nice to get spatial statistics from other specialized software rather than, say, implementing your own variogram in Perl. (Well, one (useless) use of this could be to see if the XP of monks have to do with where they live, since we already have XP and location stats for many monks--but of course you could as well more qualitatively spot the pattern by staring at a map... or you could use it in epidemiology.)
It could be a bummer if you could group data by nice geographical name but were unable to easily compute certain statistics (which could happen to a general-purpose language); or you could compute statistics grouped by zip code but were unable to more easily convert all the zip codes to more human-friendly regional names (which could happen to a statistical software). | [reply] |
Re: RFC US Region Module
by IlyaM (Parson) on Jul 07, 2003 at 18:05 UTC
|
| [reply] [d/l] [select] |
|
|