Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: designing a program - your wisdom needed

by Marshall (Canon)
on Jan 20, 2022 at 09:00 UTC ( [id://11140632]=note: print w/replies, xml ) Need Help??


in reply to designing a program - your wisdom needed

As pme asked, what SQL DB are you using? Do you have a choice? If you have a choice, I would recommend SQLite. This is my "go to" DB and it doesn't require any installation or admin of a DB Server - the "care and feeding" of a DB server can get involved. With SQLite, there are no "accounts" or "passwords", if you have permissions to access to the DB file, you can use it. One of your hassles just goes away.

I am working on an SQL application right now. While I am experimenting, my version of first.pl deletes the existing DB if it exists and starts completely anew. An alternative is the SQL: $dbh->do("DROP TABLE IF EXISTS main"); which I use in some other processing phases. This code is to show how easy it is to "get started" and create a brand new DB.

use strict; use warnings; use DBI; $|=1; ################# ## Delete the old DB (if it exits) ## and create a brand new DB ## my $dbfilename = "MASTER.sqlite"; if (-e $dbfilename) { unlink $dbfilename or die "delete of existing $dbfilename failed $!"; } # This is the basic "open DB voodoo" my %attr = ( RaiseError => 1); #auto die with error printout my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfilename","","",\%attr) or die "Couldn't connect to database $dbfilename: " . DBI->errstr; ## Ready for Business now!! #################################### create_table($dbh); #RAW_DATA table my %yr2dir =( 2018 => "../some directory", 2019 => "../some directory", ); foreach my $year (sort keys %yr2dir) { $dbh->begin_work; print "working on $year in directory $yr2dir{$year}...\n"; populate_table($dbh,$year,$yr2dir{$year}); $dbh->commit; print "finished $year\n"; }
Of course if you need to know, help can be provided about the CREATE TABLE statements and how to do the inserts. My code is not relevant to your application. My DB is a "small DB", only about 500K 1.3M records in the main table and takes about 30 seconds to completely "start over".

I think a temp table would only persist while your DB connection that created it is open. I see no need to mess with that. Just make a regular table which will persist across any disconnect/reconnect. If you want to "clean up after yourself", then just DROP the table.

So, How can I declare the database connection parameters just once? With SQLite, you just need the DB name. For my app, I just put the name "MASTER.sqlite" as a constant in all of the programs (and yes the actual name is bit more descriptive). I could make a config file with that data in it or even account/password info if I was using a different DB.

I don't see why you would need to split the code up into 4 programs? I mean things like "third.pl will join the two temp tables from first and second and load it into third". That is one SQL statement! CREATE TABLE AS.... A separate program is a lot of overhead of one SQL statement. But the normal way to do this would be to have a shell batch file that runs each of your programs and then exits if X.pl returns a non-zero error code. Just like any other program, Perl will return 0 if program exits successfully, error code if an abend. You can force error exit with exit(476); or whatever.

Update:

I mentioned that your "create 3rd table from 2 other tables" is a single SQL statement.
Here is an example where I create a table which is actually a much, much smaller set of a larger table and it is sorted in a way that I like it for easy viewing with my GUI tool.
When you create a table from other tables, SQL already knows the data types, field lengths etc from the table(s) you are using as the "source". As I did here, you can rename the columns during that process. The SQL syntax for using 2 source tables is more complex, but this is a demo of from some actual working code:

$dbh->do("DROP TABLE IF EXISTS sent_info"); #re-calculate this critter +! $dbh->do("CREATE TABLE sent_info AS SELECT year, sent_call AS call,sent_prec AS prec, sent_ck AS + ck, sent_sect AS sect FROM raw_qsos GROUP BY sent_call, year ORDER BY sent_call, year DESC");
This is so fast that I haven't bothered to measure it.

Update2:

I can see situations where processing should be broken up into multiple phases. But I don't see the need that for the OP's question.

In my current app, I am trying to figure out given your complete USPS address: street address (not a PO box), zip code and state, what state do you live in? It turns out that is a very complicated question to answer! Much to my surprise! With the DB's I have, I will be able to come close to a predicted state of residence, but there will be uncertainty in that prediction. Right now I have different "phases" of DB creation while I am developing so that I can see intermediate analysis of results while I am fine tuning the heuristics (rules of thumb). I don't think the OP is trying to answer any kind of question like "what state do you live in?".

Replies are listed 'Best First'.
Re^2: designing a program - your wisdom needed
by SpaceCowboy (Acolyte) on Jan 20, 2022 at 17:23 UTC
    Thank you, I am using Oracle and do not have a choice. Interested to know more about the challenges you face with your state detection program. I'd imagine using a regex to match various inputs for the given state. may be there is an api that would obtain state from zip code? anyways thank you for your answer.
      This is off topic. But since you are curious, I'll tell you more. I have an address that I get from a government DB. A club has a competition and their "divisions" are defined by county and state boundaries. I wanted to use the address that I have to figure out which of the various "divisions" a person is actually living within.

      As it turns out. zip codes are for the convenience of the USPS. They can cross county and even state boundaries! I was shocked to find that out, but it is true. I was even more surprised when I found out that the state in your mailing address may not actually be where that particular street address is! You can have a Florida mailing address, but your house is actually over the border in Mississippi. Yep, a registered voter in MS can have a FL address! And they can't vote for the Governor of Florida!

      The USPS does have a DB listing a single county and a single state for each zip code. In this DB there could be (and is) a zip code that I found which lists a WV county but PA as the state. We don't know for sure, but this guy probably actually lives in WV even though his mailing address shows PA as the state. That same zip code can be used for addresses both in PA and in WV.

      Be that as it may, I decided to proceed further to see "how close" with some heuristics I could come to the "correct answer". I wound up doing pretty well on that. Meaning from an address, I can predict with high probability (not certainty), but high probability which one of this club's "divisions" this guy lives in. Then after much data crunching, I found a source of errors that dwarfs any uncertainty in my heuristics. Some of these guys have multiple residences and they may not actually live at their mailing address! So at this point, at least this small part of my project has fallen apart.

      Sometimes you have to crunch a bunch of data to find out what you don't know! By cross checking between other DB's, I can come up with a rough approximation of the accuracy of my code so far. At a cost of 2 orders of magnitude in complexity, I could improve upon my zip code DB. But that effort wouldn't matter because the address on file may be a 2nd home or Mom's or Dad's, etc.!

      I have more pieces of my puzzle to solve and more data crunching is going on...this zip code to county/state of residence was just one question of many. In the meantime, my SQL skills are improving.

      For development, I just run the Perl program from my programming text editor where the output is captured in another editor window. Some of these "pre-production" things spew out 50K lines of analysis so that I can see why it is making the decisions that it is. Then I tweak the Perl code to change a heuristic and run it again to apply human brain power to decide how well or not well it is working. At the same time I have my SQLite GUI up and running. So that I can look at output tables and run ad hoc SQL commands.

      At the end, I will create a couple of pseudo CSV files. I've been yelled at more than once for claiming that such things exist in the real world. They do. I will probably make a .bat file like I suggested for you in another post. That .bat file will run the correct sequence of "filters" to generate my final result. However, I may just continue to run each "phase" manually so that I can apply human judgement and "sanity determination". This thing is far from a "turn key end user" program (and it will never be).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11140632]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-25 04:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found