Multiple file handles

MoodyDreams999 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Multiple file handles by hippo (Bishop) on Feb 02, 2023 at 16:56 UTC
80000 zip codes and they state abbr That's not a vast amount of data in the grand scheme of things. I would read that into a hash keyed on whichever part you know and with the values as the other part. Very fast to lookup and should easily fit in RAM. so I created spreadsheet That is an interesting choice. If there were enough data to warrant an external application then I would be looking to use a database. 🦛	[reply]
Re^2: Multiple file handles by cavac (Parson) on Feb 03, 2023 at 11:19 UTC
If there were enough data to warrant an external application then I would be looking to use a database. I agree. It's easy enough. It's Friday, i got my lunchbreak, so why not give the OP an example (using PostgreSQL, since that's what i have installed). Since OP didn't give us example files, let's just create some data out of thin air (zipcodes and census data) for our lovely tropical island. The censusdata is missing the zipcodes (because our friend Penultimo doesn't understand them) but it has valid city names. #!/usr/bin/env perl use strict; use warnings; use Crypt::Digest::SHA256 qw[sha256_hex]; use Carp; # Make a fake zipcode list print "Writing ZIP codes\n"; open(my $zipfh, '>', 'zipcodes.txt') or croak($!); for(my $i = 10_000; $i <= 99_999; $i++) { # Fake a city name ;-) my $cityname = sha256_hex($i); print $zipfh "$i;$cityname\n"; } close $zipfh; # Make a fake data file print "Writing data\n"; open(my $datafh, '>', 'census.txt') or croak($!); for(my $i = 0; $i < 1_000_000; $i++) { my $zip = int(rand(89_999)) + 10_000; my $cityname = sha256_hex($zip); my $population = int(rand(1_000_000)); print $datafh "$cityname;$population\n"; } close $datafh; print "Done\n"; [download] Next step is to log in to our database, create some tables and insert the data from the files. The last step is to add the missing zipcodes into the census data table using the city names. #!/usr/bin/env perl use strict; use warnings; use Carp; use DBI; my $dbh = DBI->connect("dbi:Pg:dbname=TropicoDB;host=/var/run/postgres +ql", 'ElPresidente', 'WorldDomination', {AutoCommit => 0}) or croak($!); my @stmts = ( "CREATE TABLE IF NOT EXISTS zipcodes ( zipcode text NOT NULL, cityname text NOT NULL, CONSTRAINT zipcodes_pk PRIMARY KEY(zipcode) )", "CREATE TABLE IF NOT EXISTS censusdata ( census_id bigserial, zipcode text, cityname text NOT NULL, peoplecount bigint NOT NULL, CONSTRAINT census_pk PRIMARY KEY(census_id) )", # Empty tables if they already have data "TRUNCATE zipcodes", "TRUNCATE censusdata", ); foreach my $stmt (@stmts) { print "Running $stmt...\n"; $dbh->do($stmt) or croak($dbh->errstr); } print "Loading zipcodes into database...\n"; my $zipsth = $dbh->prepare("INSERT INTO zipcodes (zipcode, cityname) V +ALUES (?, ?)") or croak($dbh->errstr); open(my $zipfh, '<', 'zipcodes.txt') or croak($!); while((my $line = <$zipfh>)) { chomp $line; my ($zipcode, $cityname) = split/\;/, $line; if(!$zipsth->execute($zipcode, $cityname)) { croak($dbh->errstr); } } close $zipfh; print "Loading census data into database...\n"; my $censussth = $dbh->prepare("INSERT INTO censusdata (cityname, peopl +ecount) VALUES (?, ?)") or croak($dbh->errstr); open(my $censusfh, '<', 'census.txt') or croak($!); while((my $line = <$censusfh>)) { chomp $line; my ($cityname, $population) = split/\;/, $line; if(!$censussth->execute($cityname, $population)) { croak($dbh->errstr); } } close $censusfh; print "Adding missing zipcodes to census data...\n"; $dbh->do("UPDATE censusdata c SET zipcode = z.zipcode FROM zipcodes z +WHERE c.cityname = z.cityname") or croak($dbh->errstr); $dbh->commit; print "Done\n"; [download] Now that we have the data in the database, it's very simple to run some reports. El Presidente needs to know which areas of his island are overpopulated and where to move those people. He also wants to know if any areas are still missing census reports. #!/usr/bin/env perl use strict; use warnings; use Carp; use DBI; my $dbh = DBI->connect("dbi:Pg:dbname=TropicoDB;host=/var/run/postgres +ql", 'ElPresidente', 'WorldDomination', {AutoCommit => 0}) or croak($!); generateReport($dbh, "Most populated zip codes", "SELECT zipcode, sum( +peoplecount) FROM censusdata GROUP BY zipcode ORDER BY sum(peoplec +ount) desc LIMIT 10"); generateReport($dbh, "Least populated zip codes", "SELECT zipcode, sum +(peoplecount) FROM censusdata GROUP BY zipcode ORDER BY sum(peoplec +ount) LIMIT 10"); generateReport($dbh, "Zip codes without census", "SELECT z.zipcode, z. +cityname FROM zipcodes z WHERE NOT EXISTS ( SELECT 1 FROM cens +usdata c WHERE c.zipcode = +z.zipcode )"); $dbh->commit; sub generateReport { my ($dbh, $title, $stmt) = @_; print $title, "\n"; print "=" x length($title), "\n"; my $sth = $dbh->prepare($stmt) or croak($dbh->errstr); $sth->execute or croak($dbh->errstr); while((my @line = $sth->fetchrow_array)) { print join(';', @line), "\n"; } $sth->finish; print "\n\n"; } [download] Results: Most populated zip codes ======================== 15864;16121159 50274;15581665 90418;15213432 47146;14784133 79867;14738692 66935;14603419 25969;14572689 29868;14531580 63780;14452431 95050;14350914 Least populated zip codes ========================= 26940;110605 10822;158808 11776;170321 43384;187501 61853;211522 34028;227782 21339;228705 61363;319850 22065;322939 42693;329595 Zip codes without census ======================== 99999;fd5f56b40a79a385708428e7b32ab996a681080a166a2206e750eb4819186145 [download] PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP	[reply] [d/l] [select]
Re^3: Multiple file handles by MoodyDreams999 (Beadle) on Apr 07, 2023 at 21:26 UTC
Wow, thats a great way to do it, definitely gives me some ideas and you connecting to the DB helps for reference because ive been struggling with that due to my company's current script using a bunch of regex off of a config file to connect. I'll have to get better at SSCCE so I don't end up bogging you guys down or giving you too little info. Thank you so much!	[reply]
Re^2: Multiple file handles by MoodyDreams999 (Beadle) on Apr 07, 2023 at 21:28 UTC
Yeah honestly I think thats the best way to go, i'll try to connect to mysql locally to reference a table, thank you.	[reply]
Re: Multiple file handles by AnomalousMonk (Archbishop) on Feb 02, 2023 at 19:56 UTC
... 80000 zip codes and they state abbr. You'll have to double-check this, but I believe that the first three digits of a ZIP code uniquely define a state. The first three digits designate a sectional center facility (SCF), and IIUC such a facility does not service multiple states. E.g., the geographical area of 90210 (Beverly Hills, CA) lies entirely in California and the 902 SCF can only map to CA. (Note, however, that there are both civil and military SCFs.) Thus, the mapping for all states reduces from 80000 to 1000 SCFs max, an even more manageable hash size. :) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^2: Multiple file handles by tybalt89 (Monsignor) on Feb 06, 2023 at 19:57 UTC
According to the on-line ZIP code file I found, there are four "first three digits" ZIP codes that map to more than one state. `{ 63 => { CT => 39, NY => 1 }, 205 => { DC => 49, VA => 1 }, 739 => { OK => 14, TX => 1 }, 834 => { ID => 44, WY => 1 }, }` [download] UPDATE: I got the zip codes file from `https://www.downloadexcelfiles.com/sites/default/files/docs/usazipcode +-1512j.csv` [download]	[reply] [d/l] [select]
Re^3: Multiple file handles by MoodyDreams999 (Beadle) on Apr 07, 2023 at 21:31 UTC
That's a pretty cool way to do that, I shall play around with that method because honestly the less data I have to reference the better. Thank you!	[reply]
Re: Multiple file handles by NERDVANA (Deacon) on Feb 05, 2023 at 05:49 UTC
If the reason you used a spreadsheet was because it was the easiest way for you to edit the data, and your actual question here is "what is the easiest way to pull data from a spreadsheet into Perl", I will suggest my Data::TableReader which can easily extract some columns of a spreadsheet into perl hashrefs (one per row) which you can then use in any algorithm you like, such as suggested by others here.	[reply]
Re^2: Multiple file handles by MoodyDreams999 (Beadle) on Apr 07, 2023 at 21:33 UTC
OMG that module would of made my life so much easier months ago!!! Thank you!	[reply]
Re^3: Multiple file handles by BillKSmith (Monsignor) on Apr 08, 2023 at 15:44 UTC
Be aware that even five-digit zip codes do not always tell what state a person lives in. I once had a boss who lived in New Jersey. (Paid state and local taxes in NJ. Had a NJ drivers license) But, he had a New York mailing address (both the zip cope and the "NY"). His zip code physically included a small part of NJ. Bill	[reply]
Re^4: Multiple file handles by LanX (Saint) on Apr 08, 2023 at 15:57 UTC
Re^4: Multiple file handles by AnomalousMonk (Archbishop) on Apr 08, 2023 at 23:54 UTC
Re: Multiple file handles by exilepanda (Friar) on Feb 03, 2023 at 06:02 UTC
using one file to reference zip codes Instead of Spreadsheet, INI file maybe a lite weight alternative. The content may be look like: `code1 = state 1 code2 = state 2 ...` [download] Then your Perl script: `use strict; use Config::Tiny; my $Config = Config::Tiny->read( 'ZipCodes.ini' ); sub check { my $code = shift; return $Config->{_}{$code} or die "No such code '$code' "; } my $st_name = check ( $code ) ;` [download] Code not tested, but the idea is like that. CPAN Config::Tiny	[reply] [d/l] [select]
Re^2: Multiple file handles by stevieb (Canon) on Feb 03, 2023 at 17:09 UTC
If I were to use a file as opposed to a DB, I'd most definitely use JSON. Cross-platform, cross-language, and web API ready. codes.json `{ "91280": "CA", "61682": "NY", "23823": "WA" }` [download] Perl script: `use warnings; use strict; use JSON; my $file = 'codes.json'; my $data; { local $/; open my $fh, '<', $file or die $!; my $json = <$fh>; $data = decode_json $json; } print "$_: $data->{$_}\n" for sort keys %$data;` [download] Output: `23823: WA 61682: NY 91280: CA` [download]	[reply] [d/l] [select]
Re^3: Multiple file handles by MoodyDreams999 (Beadle) on Apr 07, 2023 at 21:44 UTC
Thats actually a really good idea, We do use alot json stuff here, this might be a simpler way for me to do this, because I'd like to create 2 versions where I can run one on my machine, then use another to reference the database, but still not sure if I could reference mysql database on my local machine. i'll post more data once I get that connection working. This is what I have so far though. `my $database = "asterisk"; my $user = "root"; my $host = 192.168.8.50; my $password = 123; my $dbhA = DBI->connect("DBI:mysql:$database,$host","$password") o +r die "Couldn't connect to database: ".DBI->errstr;` [download] using DBI What would you recommed DBI vs DBD::mysql?	[reply] [d/l]
Re^4: Multiple file handles by 1nickt (Canon) on Apr 08, 2023 at 10:41 UTC
Re^5: Multiple file handles by MoodyDreams999 (Beadle) on Apr 13, 2023 at 17:26 UTC
Re^2: Multiple file handles by MoodyDreams999 (Beadle) on Apr 07, 2023 at 21:37 UTC
oo thats a nice module, I'm going to try playing around with that idea maybe theres a tiny module I can use to help reference or make a log file too. I've decided to take another crack at connecting to mysql database this time though, but ill definitely use that idea. Thank you	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks