SpaceCowboy has asked for the wisdom of the Perl Monks concerning the following question:
I would like your wisdom and advice on designing on following. I am looking to extract tables from a database and start executing SQL statements in a certain order.
so far, I have four perl programs, primary.pl, first.pl, second.pl, third.pl. primary will be the main program in which i am calling the other three scripts.
primary.pl will call the scripts, first.pl and second.pl will create temp table schema and each will run a huge query that loads data into respective temp table schema. third.pl will join the two temp tables from first and second and load it into third.
my question is,
How can I declare the database connection parameters just once?
when I call the scripts, the first/second/third wont execute without the database connection parameters mentioned in each one of them
how can I serialize the execution? such that subsequent scripts execute if and only if the first one executes successfully.
Is it possible to write multiple statements like dbh->do("create temp schema", "insert table");
Any general wisdom here? am I doing something outlandish?
primary.pl
use dbi; use warnings; use strict; ***import database credentials here usename password database host driver dbh $dbh->do("alter session set current_schema = the current schema”); my $jobs = `perl C:/first.pl`; print “first script executed”; my $jobs = `perl C:/second.pl`; print “second script executed”; my $jobs = `perl C:/third.pl`; print “third script executed”; *** code to export the final temp table to csv.
first.pl
use dbi; use warnings; use strict; ***import database credentials here usename password database host driver dbh dbh->do(“create temp table”) dbh->do(“insert into ttemp table”); *** exit the database
second.pl
use dbi; use warnings; use strict; ***import database credentials here usename password database host driver dbh dbh->do(“create second temp table”) dbh->do(“insert into second temp table”); *** exit the database
third.pl
use dbi; use warnings; use strict; ***import database credentials here usename password database host driver dbh dbh->do(“create third temp table”) dbh->do(“insert into third temp table where you join first and second” +); *** exit the database
thank you for your time
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: designing a program - your wisdom needed
by Corion (Patriarch) on Jan 20, 2022 at 07:12 UTC | |
If all that's different between your three "workhorse" programs is the SQL, why not read/run the SQL in the main program?
... or read the SQL from three files and run that SQL:
... or, if you prefer a canned solution, take a look at DBIx::RunSQL, which runs SQL from strings or files, and has some (rough) logic to execute multiple SQL statements from a single file. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: designing a program - your wisdom needed
by hippo (Bishop) on Jan 20, 2022 at 10:27 UTC | |
How can I declare the database connection parameters just once? In general, you can share such data between different scripts by: Note that here the module option is perhaps best because the module can handle not just shared data but also shared operations such as creating the database connection, handling errors/exceptions, serialising/encoding data, etc. Or you could combine two of them - have a module to handle all this and then that module could read the config file or environment variable, etc. how can I serialize the execution? such that subsequent scripts execute if and only if the first one executes successfully. You can do this in any number of ways, but I wouldn't. It is almost never a smart move to call a Perl script directly from another Perl script. Call it an antipattern if you like. Far better to take the second script and put all of its operations in a module instead. The second script then just becomes a wrapper around that module should you ever want to run it independently. The code in the module can be run from any script (or module) which uses it. Any general wisdom here? General rule: don't shell out from Perl to run Perl. 🦛 | [reply] [Watch: Dir/Any] |
by Marshall (Canon) on Jan 20, 2022 at 12:39 UTC | |
first.pl and second.pl will create temp table schema and each will run a huge query that loads data into respective temp table schema. third.pl will join the two temp tables from first and second and load it into third. It could very well be that the OP just needs to log onto a network SQL DB and run just 3 SQL commands!? I am not sure at all there needs to be 4 Perl programs for this! It appears that a single Perl program of a couple of pages long will do what the OP is requesting. If not, then I want to know why not? (that's a question for the OP). | [reply] [Watch: Dir/Any] |
by SpaceCowboy (Acolyte) on Jan 20, 2022 at 21:40 UTC | |
| [reply] [Watch: Dir/Any] |
by Marshall (Canon) on Jan 21, 2022 at 20:16 UTC | |
Re: designing a program - your wisdom needed
by Discipulus (Canon) on Jan 20, 2022 at 08:09 UTC | |
the first tool to code reuse are subroutines; see perlsub then if you can generalize a behaviour you can create your own module. As Corion++ already suggested you can have all sql statements in the main program and I'd use sub also for each steps: this help a lot having the main flow of the program short and clean.
L*
There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS. | [reply] [Watch: Dir/Any] [d/l] |
Re: designing a program - your wisdom needed
by cavac (Parson) on Jan 20, 2022 at 08:59 UTC | |
Hmm, i can so you are only using do() to execute the SQL statements. So i'm assuming you are "blindly" executing stuff, without the need to read back data from the database. You might not need multiple programs or even subroutines to do this one. Let's assume you have a list of SQL statements you want executed. You can just put them into text files, one per line, then do something like this:
If you do more complex stuff that needs logic, you can split the program into perl modules. This would still use a single database handle, but allows you to do complicated interactive stuff. Let's start with eagly.pl, the main program:
Ok, let's take a look at PDI.pm (each module would look similar, depending on what it does)
Hope that helps a bit in designing your program.
perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: designing a program - your wisdom needed
by kcott (Archbishop) on Jan 20, 2022 at 12:36 UTC | |
G'day SpaceCowboy, You have a lot of mistakes in your code. Here's a selection: We all make typos but, with this many, you're quite likely to spend more time debugging than coding. Pay more attention to the code you're writing: don't make a rod for your back. "Any general wisdom here?" The following is intended to be very generic. There's not enough up-front information for me to provide working code. Adapt what I have here for your specific needs. I'd probably start with a front-end script that might look something like:
I don't know what you had in mind for "***import database credentials here"; perhaps Getopt::Long, a config file, or something else. I'll leave how you populate $database_credentials up to you. In a separate file (.../SpaceCowboy/Database/Module.pm), code SpaceCowboy::Database::Module which does all the work. Here's a rough example of what that might look like:
I've included a very basic new() method; however, you'll probably find one of the many frameworks available are easier to work with — Moo might be a good one to use here. See also: "perlintro: OO Perl" and perlootut. The database handle, $DBH, is lexically scoped such that only dbh() can access it directly. It is only ever set once. You'll need to look at the DBD::whatever documentation, for whatever database you intend to use, to find out the correct way to call DBI->connect(...). "how can I serialize the execution?" You could use flags for each of primary(), first(), etc. Set these to indicate which parts have completed successfully; do not rerun the flagged parts. You'll possibly need a reset() method, to turn these off, so that a fresh run from the start is possible. "Is it possible to write multiple statements like dbh->do("create temp schema", "insert table");" That may be dependent on the database you're using. You could write:
Be aware that might be problematic if you're also trying to flag the success of individual statements. Ask yourself why you think you'd need this. If you can't come up with a good reason, don't do it. — Ken | [reply] [Watch: Dir/Any] [d/l] [select] |
by eyepopslikeamosquito (Archbishop) on Jan 21, 2022 at 05:08 UTC | |
Excellent response from accomplished Perl programmer kcott 👌🏻🛏️🫖 of Melbourne (who is currently kicking back with a beer at the Australian Open Tennis). SpaceCowboy 🌌🐄👦, though you're primary job is presumably herding cattle in outer space, your Perl seems to be at a similar level to Bod from (working-class) Coventry UK - so you should learn a lot from the many useful responses Bod received when he recently asked about Procedural vs OOP modules. Updated: added emojis from Emojis for Perl Monk names for kcott and SpaceCowboy | [reply] [Watch: Dir/Any] |
by kcott (Archbishop) on Jan 21, 2022 at 06:38 UTC | |
G'day eyepopslikeamosquito, ➕➕ Many thanks for your high praise. I've actually just knocked off work for the week, so I'm kicking back with a beer at the Monastery. As it's my birthday tomorrow, I may have kicked back all the way to a horizontal orientation by midnight. Possibly herding galactic space goats across the firmament by that time 👨🏻🚀 🐐 🪐 or at least until 🍺 🧟 🧠 Avagoodweegend! 😈 — Ken | [reply] [Watch: Dir/Any] |
Re: designing a program - your wisdom needed
by Marshall (Canon) on Jan 20, 2022 at 09:00 UTC | |
I am working on an SQL application right now. While I am experimenting, my version of first.pl deletes the existing DB if it exists and starts completely anew. An alternative is the SQL: $dbh->do("DROP TABLE IF EXISTS main"); which I use in some other processing phases. This code is to show how easy it is to "get started" and create a brand new DB.
Of course if you need to know, help can be provided about the CREATE TABLE statements and how to do the inserts. My code is not relevant to your application. My DB is a "small DB", only about I think a temp table would only persist while your DB connection that created it is open. I see no need to mess with that. Just make a regular table which will persist across any disconnect/reconnect. If you want to "clean up after yourself", then just DROP the table. So, How can I declare the database connection parameters just once? With SQLite, you just need the DB name. For my app, I just put the name "MASTER.sqlite" as a constant in all of the programs (and yes the actual name is bit more descriptive). I could make a config file with that data in it or even account/password info if I was using a different DB. I don't see why you would need to split the code up into 4 programs? I mean things like "third.pl will join the two temp tables from first and second and load it into third". That is one SQL statement! CREATE TABLE AS.... A separate program is a lot of overhead of one SQL statement. But the normal way to do this would be to have a shell batch file that runs each of your programs and then exits if X.pl returns a non-zero error code. Just like any other program, Perl will return 0 if program exits successfully, error code if an abend. You can force error exit with exit(476); or whatever. Update:
I mentioned that your "create 3rd table from 2 other tables" is a single SQL statement.
This is so fast that I haven't bothered to measure it. Update2: I can see situations where processing should be broken up into multiple phases. But I don't see the need that for the OP's question. In my current app, I am trying to figure out given your complete USPS address: street address (not a PO box), zip code and state, what state do you live in? It turns out that is a very complicated question to answer! Much to my surprise! With the DB's I have, I will be able to come close to a predicted state of residence, but there will be uncertainty in that prediction. Right now I have different "phases" of DB creation while I am developing so that I can see intermediate analysis of results while I am fine tuning the heuristics (rules of thumb). I don't think the OP is trying to answer any kind of question like "what state do you live in?". | [reply] [Watch: Dir/Any] [d/l] [select] |
by SpaceCowboy (Acolyte) on Jan 20, 2022 at 17:23 UTC | |
| [reply] [Watch: Dir/Any] |
by Marshall (Canon) on Jan 21, 2022 at 21:59 UTC | |
As it turns out. zip codes are for the convenience of the USPS. They can cross county and even state boundaries! I was shocked to find that out, but it is true. I was even more surprised when I found out that the state in your mailing address may not actually be where that particular street address is! You can have a Florida mailing address, but your house is actually over the border in Mississippi. Yep, a registered voter in MS can have a FL address! And they can't vote for the Governor of Florida! The USPS does have a DB listing a single county and a single state for each zip code. In this DB there could be (and is) a zip code that I found which lists a WV county but PA as the state. We don't know for sure, but this guy probably actually lives in WV even though his mailing address shows PA as the state. That same zip code can be used for addresses both in PA and in WV. Be that as it may, I decided to proceed further to see "how close" with some heuristics I could come to the "correct answer". I wound up doing pretty well on that. Meaning from an address, I can predict with high probability (not certainty), but high probability which one of this club's "divisions" this guy lives in. Then after much data crunching, I found a source of errors that dwarfs any uncertainty in my heuristics. Some of these guys have multiple residences and they may not actually live at their mailing address! So at this point, at least this small part of my project has fallen apart. Sometimes you have to crunch a bunch of data to find out what you don't know! By cross checking between other DB's, I can come up with a rough approximation of the accuracy of my code so far. At a cost of 2 orders of magnitude in complexity, I could improve upon my zip code DB. But that effort wouldn't matter because the address on file may be a 2nd home or Mom's or Dad's, etc.! I have more pieces of my puzzle to solve and more data crunching is going on...this zip code to county/state of residence was just one question of many. In the meantime, my SQL skills are improving. For development, I just run the Perl program from my programming text editor where the output is captured in another editor window. Some of these "pre-production" things spew out 50K lines of analysis so that I can see why it is making the decisions that it is. Then I tweak the Perl code to change a heuristic and run it again to apply human brain power to decide how well or not well it is working. At the same time I have my SQLite GUI up and running. So that I can look at output tables and run ad hoc SQL commands. At the end, I will create a couple of pseudo CSV files. I've been yelled at more than once for claiming that such things exist in the real world. They do. I will probably make a .bat file like I suggested for you in another post. That .bat file will run the correct sequence of "filters" to generate my final result. However, I may just continue to run each "phase" manually so that I can apply human judgement and "sanity determination". This thing is far from a "turn key end user" program (and it will never be). | [reply] [Watch: Dir/Any] |
Re: designing a program - your wisdom needed
by bliako (Monsignor) on Jan 20, 2022 at 18:58 UTC | |
I would also go with what other Monks suggested: to use a sub for each of your proposed scripts' logic. One of their input parameters should be the DB handle. Once you have the three subs, you can still create 3 scripts, as per your original proposal, which reads in DB credentials, create a DB handle and call the appropriate sub. So you have both worlds (for whatever reason). (Edit: or call the three subs from one script which at the beginning asks or reads DB credentials once.) One point is unclear to me, you mention "join". Does it refer to SQL JOIN or to concatenate? Regarding your proposed "temporary table". It may not be necessary to create temporary tables if all you need is to pull results from DB, filter or concatenate and save. Perhaps you can save these results in Perl variables and do the transformations in Perl, but only if you are operating from a single Perl script calling the 3 subs (and data is small, though SQL can do it better in the DB, if one can write it that is ;) ). I don't know if this will be preferable than using the DB as temporary storage, it depends on your SQL knowledge and size of results to be transformed. Also, I have recently used Redis which is a nosql (temporary) data store, which more-or-less acts like a Perl Hash, but accessible from many programs within the same machine (or remote). I used it to share temp data between scripts (like your proposed 3 scripts) when I was too lazy to write SQL and do it within the DB. That data was like a string blob, JSON string etc. But Redis did not care about any structure. Which was very convenient in storing it and retrieving it, no questions asked regarding structure. If you find this idea interesting but you are not allowed to install Redis, here is a very lame implementation in pure Perl: Simple data-store with Perl . Note that you can store Perl data structures, e.g. nested hashes, into files or DB blobs by using Data::Serializer or Storable | [reply] [Watch: Dir/Any] |
by SpaceCowboy (Acolyte) on Jan 20, 2022 at 21:59 UTC | |
So, you are using Redis as staging? I belive Redis is memory hungry and may take a lot of physical memory. I was unable to understand to fully understand the "simple data store wth perl" link, what I gather is that it connects wtih a remote server and loops into some kind of regex expression? pardon my ignorance, I would love to learn your intention here and how this applies to relational tables. | [reply] [Watch: Dir/Any] |
by bliako (Monsignor) on Jan 21, 2022 at 08:54 UTC | |
so, it's better to do that within the DB at least between scripts 1 and 2. (and to be clear on my above writing, doing things into DB is usually much better than retrieving data and transforming it in your Perl script *provided you write the correct SQL* - which I find almost impenetrable, that's why I am always looking for alternative, albeit roundabout and perhaps inefficient, ways. Bottomline: sticking with the DB is better, usually.) I belive Redis is memory hungry and may take a lot of physical memory.I have not noticed anything upnormal there. You can always try it out and see. It was quite fast for me. Can't say anything about memory usage. The "simple data store", I linked, acts as a server which clients (let's say your scripts) contact in order to either store a string (which can be some JSON or XML or with minor modification to be a binary serialised perl hash or array, possibly nested) by key, or retrieve a string by key. The regex you refer to implements the simplest API to do that. That is, it checks if client gave it something like KEY1=VALUE1, in which case it stores it. Alternatively, client can send a KEY1=, in which case it looks in its private hash-store if KEY1 exists and sends back to client the value stored. Warning: all checks for storing and retrieving data done in DBs is absent, e.g. handle race conditions, etc. Also, there is no encryption or password protection, all scripts can see all data provided they know the key. The "simple data store" does not apply to relational tables directly, it just offers a way for separate, independent scripts or programs (in various languages) to share some temporary data between them. It does not replace a database. It just offers a way to avoid using the DB as a temp data store. One example of use: script1 does data scraping and processing at irregular intervals. It places raw results in the data store. script2 run as a "CGI" checks if results exist in data store and converts them to HTML for viewing. For me script1 was in C, and script2 in Perl. It saved me a lot of trouble to do that (edit: ) instead of (end edit) with temp DB tables (ok, sqlite is a bit easier). That's my experience which is not industrial. Others here have industry experience. | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: designing a program - your wisdom needed
by pme (Monsignor) on Jan 20, 2022 at 06:31 UTC | |
| [reply] [Watch: Dir/Any] |
by SpaceCowboy (Acolyte) on Jan 20, 2022 at 17:18 UTC | |
| [reply] [Watch: Dir/Any] |
by pme (Monsignor) on Jan 21, 2022 at 09:51 UTC | |
Oracle has a great feature, called global temporary table. For details see https://www.oracletutorial.com/oracle-basics/oracle-global-temporary-table/. You can create your temporary table(s) only once, they will be there for good. These tables can be populated with INSERT INTO SELECT statements (see https://www.oracletutorial.com/oracle-basics/oracle-insert-into-select/) in the beginning of the transaction. Then they can be modified by UPDATE and/or MERGE (see https://www.oracletutorial.com/oracle-basics/oracle-merge/). When data looks okay in the global temp table, it can be written into its final location by INSERT or MERGE. Simple and straightforward. Based on these features you probably can put together single perl script using module DBI (DBD::Oracle) and all the steps can be done with only a few $dbh->do(sqlstatement). Regards | [reply] [Watch: Dir/Any] |