Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Multiple file handles

by MoodyDreams999 (Beadle)
on Feb 02, 2023 at 16:00 UTC ( [id://11150099]=perlquestion: print w/replies, xml ) Need Help??

MoodyDreams999 has asked for the wisdom of the Perl Monks concerning the following question:

Hey, Monks, I've never done this before, besides referencing from a database in sql to perl. I have a file that will have missing data that I am parsing, I want to populate the State based off the zip codes, so I created spreadsheet with like 80000 zip codes and they state abbr. Could you please give me some guidance on using one file to reference zip codes and based on those to update the state on another external file.

Replies are listed 'Best First'.
Re: Multiple file handles
by hippo (Bishop) on Feb 02, 2023 at 16:56 UTC
    80000 zip codes and they state abbr

    That's not a vast amount of data in the grand scheme of things. I would read that into a hash keyed on whichever part you know and with the values as the other part. Very fast to lookup and should easily fit in RAM.

    so I created spreadsheet

    That is an interesting choice. If there were enough data to warrant an external application then I would be looking to use a database.


    🦛

      If there were enough data to warrant an external application then I would be looking to use a database.

      I agree. It's easy enough. It's Friday, i got my lunchbreak, so why not give the OP an example (using PostgreSQL, since that's what i have installed). Since OP didn't give us example files, let's just create some data out of thin air (zipcodes and census data) for our lovely tropical island. The censusdata is missing the zipcodes (because our friend Penultimo doesn't understand them) but it has valid city names.

      #!/usr/bin/env perl use strict; use warnings; use Crypt::Digest::SHA256 qw[sha256_hex]; use Carp; # Make a fake zipcode list print "Writing ZIP codes\n"; open(my $zipfh, '>', 'zipcodes.txt') or croak($!); for(my $i = 10_000; $i <= 99_999; $i++) { # Fake a city name ;-) my $cityname = sha256_hex($i); print $zipfh "$i;$cityname\n"; } close $zipfh; # Make a fake data file print "Writing data\n"; open(my $datafh, '>', 'census.txt') or croak($!); for(my $i = 0; $i < 1_000_000; $i++) { my $zip = int(rand(89_999)) + 10_000; my $cityname = sha256_hex($zip); my $population = int(rand(1_000_000)); print $datafh "$cityname;$population\n"; } close $datafh; print "Done\n";

      Next step is to log in to our database, create some tables and insert the data from the files. The last step is to add the missing zipcodes into the census data table using the city names.

      #!/usr/bin/env perl use strict; use warnings; use Carp; use DBI; my $dbh = DBI->connect("dbi:Pg:dbname=TropicoDB;host=/var/run/postgres +ql", 'ElPresidente', 'WorldDomination', {AutoCommit => 0}) or croak($!); my @stmts = ( "CREATE TABLE IF NOT EXISTS zipcodes ( zipcode text NOT NULL, cityname text NOT NULL, CONSTRAINT zipcodes_pk PRIMARY KEY(zipcode) )", "CREATE TABLE IF NOT EXISTS censusdata ( census_id bigserial, zipcode text, cityname text NOT NULL, peoplecount bigint NOT NULL, CONSTRAINT census_pk PRIMARY KEY(census_id) )", # Empty tables if they already have data "TRUNCATE zipcodes", "TRUNCATE censusdata", ); foreach my $stmt (@stmts) { print "Running $stmt...\n"; $dbh->do($stmt) or croak($dbh->errstr); } print "Loading zipcodes into database...\n"; my $zipsth = $dbh->prepare("INSERT INTO zipcodes (zipcode, cityname) V +ALUES (?, ?)") or croak($dbh->errstr); open(my $zipfh, '<', 'zipcodes.txt') or croak($!); while((my $line = <$zipfh>)) { chomp $line; my ($zipcode, $cityname) = split/\;/, $line; if(!$zipsth->execute($zipcode, $cityname)) { croak($dbh->errstr); } } close $zipfh; print "Loading census data into database...\n"; my $censussth = $dbh->prepare("INSERT INTO censusdata (cityname, peopl +ecount) VALUES (?, ?)") or croak($dbh->errstr); open(my $censusfh, '<', 'census.txt') or croak($!); while((my $line = <$censusfh>)) { chomp $line; my ($cityname, $population) = split/\;/, $line; if(!$censussth->execute($cityname, $population)) { croak($dbh->errstr); } } close $censusfh; print "Adding missing zipcodes to census data...\n"; $dbh->do("UPDATE censusdata c SET zipcode = z.zipcode FROM zipcodes z +WHERE c.cityname = z.cityname") or croak($dbh->errstr); $dbh->commit; print "Done\n";

      Now that we have the data in the database, it's very simple to run some reports. El Presidente needs to know which areas of his island are overpopulated and where to move those people. He also wants to know if any areas are still missing census reports.

      #!/usr/bin/env perl use strict; use warnings; use Carp; use DBI; my $dbh = DBI->connect("dbi:Pg:dbname=TropicoDB;host=/var/run/postgres +ql", 'ElPresidente', 'WorldDomination', {AutoCommit => 0}) or croak($!); generateReport($dbh, "Most populated zip codes", "SELECT zipcode, sum( +peoplecount) FROM censusdata GROUP BY zipcode ORDER BY sum(peoplec +ount) desc LIMIT 10"); generateReport($dbh, "Least populated zip codes", "SELECT zipcode, sum +(peoplecount) FROM censusdata GROUP BY zipcode ORDER BY sum(peoplec +ount) LIMIT 10"); generateReport($dbh, "Zip codes without census", "SELECT z.zipcode, z. +cityname FROM zipcodes z WHERE NOT EXISTS ( SELECT 1 FROM cens +usdata c WHERE c.zipcode = +z.zipcode )"); $dbh->commit; sub generateReport { my ($dbh, $title, $stmt) = @_; print $title, "\n"; print "=" x length($title), "\n"; my $sth = $dbh->prepare($stmt) or croak($dbh->errstr); $sth->execute or croak($dbh->errstr); while((my @line = $sth->fetchrow_array)) { print join(';', @line), "\n"; } $sth->finish; print "\n\n"; }

      Results:

      Most populated zip codes ======================== 15864;16121159 50274;15581665 90418;15213432 47146;14784133 79867;14738692 66935;14603419 25969;14572689 29868;14531580 63780;14452431 95050;14350914 Least populated zip codes ========================= 26940;110605 10822;158808 11776;170321 43384;187501 61853;211522 34028;227782 21339;228705 61363;319850 22065;322939 42693;329595 Zip codes without census ======================== 99999;fd5f56b40a79a385708428e7b32ab996a681080a166a2206e750eb4819186145

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
        Wow, thats a great way to do it, definitely gives me some ideas and you connecting to the DB helps for reference because ive been struggling with that due to my company's current script using a bunch of regex off of a config file to connect. I'll have to get better at SSCCE so I don't end up bogging you guys down or giving you too little info. Thank you so much!
      Yeah honestly I think thats the best way to go, i'll try to connect to mysql locally to reference a table, thank you.
Re: Multiple file handles
by AnomalousMonk (Archbishop) on Feb 02, 2023 at 19:56 UTC
    ... 80000 zip codes and they state abbr.

    You'll have to double-check this, but I believe that the first three digits of a ZIP code uniquely define a state. The first three digits designate a sectional center facility (SCF), and IIUC such a facility does not service multiple states. E.g., the geographical area of 90210 (Beverly Hills, CA) lies entirely in California and the 902 SCF can only map to CA. (Note, however, that there are both civil and military SCFs.)

    Thus, the mapping for all states reduces from 80000 to 1000 SCFs max, an even more manageable hash size. :)


    Give a man a fish:  <%-{-{-{-<

      According to the on-line ZIP code file I found, there are four "first three digits" ZIP codes that map to more than one state.

      { 63 => { CT => 39, NY => 1 }, 205 => { DC => 49, VA => 1 }, 739 => { OK => 14, TX => 1 }, 834 => { ID => 44, WY => 1 }, }

      UPDATE: I got the zip codes file from

      https://www.downloadexcelfiles.com/sites/default/files/docs/usazipcode +-1512j.csv

        That's a pretty cool way to do that, I shall play around with that method because honestly the less data I have to reference the better. Thank you!
Re: Multiple file handles
by NERDVANA (Deacon) on Feb 05, 2023 at 05:49 UTC
    If the reason you used a spreadsheet was because it was the easiest way for you to edit the data, and your actual question here is "what is the easiest way to pull data from a spreadsheet into Perl", I will suggest my Data::TableReader which can easily extract some columns of a spreadsheet into perl hashrefs (one per row) which you can then use in any algorithm you like, such as suggested by others here.
      OMG that module would of made my life so much easier months ago!!! Thank you!
        Be aware that even five-digit zip codes do not always tell what state a person lives in. I once had a boss who lived in New Jersey. (Paid state and local taxes in NJ. Had a NJ drivers license) But, he had a New York mailing address (both the zip cope and the "NY"). His zip code physically included a small part of NJ.
        Bill
Re: Multiple file handles
by exilepanda (Friar) on Feb 03, 2023 at 06:02 UTC
    using one file to reference zip codes
    Instead of Spreadsheet, INI file maybe a lite weight alternative. The content may be look like:
    code1 = state 1 code2 = state 2 ...
    Then your Perl script:
    use strict; use Config::Tiny; my $Config = Config::Tiny->read( 'ZipCodes.ini' ); sub check { my $code = shift; return $Config->{_}{$code} or die "No such code '$code' "; } my $st_name = check ( $code ) ;
    Code not tested, but the idea is like that. CPAN Config::Tiny

      If I were to use a file as opposed to a DB, I'd most definitely use JSON. Cross-platform, cross-language, and web API ready.

      codes.json

      { "91280": "CA", "61682": "NY", "23823": "WA" }

      Perl script:

      use warnings; use strict; use JSON; my $file = 'codes.json'; my $data; { local $/; open my $fh, '<', $file or die $!; my $json = <$fh>; $data = decode_json $json; } print "$_: $data->{$_}\n" for sort keys %$data;

      Output:

      23823: WA 61682: NY 91280: CA
        Thats actually a really good idea, We do use alot json stuff here, this might be a simpler way for me to do this, because I'd like to create 2 versions where I can run one on my machine, then use another to reference the database, but still not sure if I could reference mysql database on my local machine. i'll post more data once I get that connection working. This is what I have so far though.
        my $database = "asterisk"; my $user = "root"; my $host = 192.168.8.50; my $password = 123; my $dbhA = DBI->connect("DBI:mysql:$database,$host","$password") o +r die "Couldn't connect to database: ".DBI->errstr;
        using DBI What would you recommed DBI vs DBD::mysql?
      oo thats a nice module, I'm going to try playing around with that idea maybe theres a tiny module I can use to help reference or make a log file too. I've decided to take another crack at connecting to mysql database this time though, but ill definitely use that idea. Thank you

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11150099]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-24 14:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found