http://qs1969.pair.com?node_id=11140625

SpaceCowboy has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I would like your wisdom and advice on designing on following. I am looking to extract tables from a database and start executing SQL statements in a certain order.

so far, I have four perl programs, primary.pl, first.pl, second.pl, third.pl. primary will be the main program in which i am calling the other three scripts.

primary.pl will call the scripts, first.pl and second.pl will create temp table schema and each will run a huge query that loads data into respective temp table schema. third.pl will join the two temp tables from first and second and load it into third.

my question is,

How can I declare the database connection parameters just once?
when I call the scripts, the first/second/third wont execute without the database connection parameters mentioned in each one of them

how can I serialize the execution? such that subsequent scripts execute if and only if the first one executes successfully.

Is it possible to write multiple statements like dbh->do("create temp schema", "insert table");

Any general wisdom here? am I doing something outlandish?

primary.pl
use dbi; use warnings; use strict; ***import database credentials here usename password database host driver dbh $dbh->do("alter session set current_schema = the current schema”); my $jobs = `perl C:/first.pl`; print “first script executed”; my $jobs = `perl C:/second.pl`; print “second script executed”; my $jobs = `perl C:/third.pl`; print “third script executed”; *** code to export the final temp table to csv.


first.pl
use dbi; use warnings; use strict; ***import database credentials here usename password database host driver dbh dbh->do(“create temp table”) dbh->do(“insert into ttemp table”); *** exit the database


second.pl
use dbi; use warnings; use strict; ***import database credentials here usename password database host driver dbh dbh->do(“create second temp table”) dbh->do(“insert into second temp table”); *** exit the database


third.pl
use dbi; use warnings; use strict; ***import database credentials here usename password database host driver dbh dbh->do(“create third temp table”) dbh->do(“insert into third temp table where you join first and second” +); *** exit the database


thank you for your time

Replies are listed 'Best First'.
Re: designing a program - your wisdom needed
by Corion (Patriarch) on Jan 20, 2022 at 07:12 UTC

    If all that's different between your three "workhorse" programs is the SQL, why not read/run the SQL in the main program?

    $dbh->do(<<SQL); -- create first temp table SQL $dbh->do(<<SQL); -- create second temp table SQL $dbh->do(<<SQL); -- create third temp table SQL

    ... or read the SQL from three files and run that SQL:

    for my $file ("first.sql", "second.sql", "third.sql") { open my $fh, '<', $file or die "$file: $!"; local $/; my $sql = <$fh>; $dbh->do( $sql ); }

    ... or, if you prefer a canned solution, take a look at DBIx::RunSQL, which runs SQL from strings or files, and has some (rough) logic to execute multiple SQL statements from a single file.

Re: designing a program - your wisdom needed
by hippo (Bishop) on Jan 20, 2022 at 10:27 UTC
    How can I declare the database connection parameters just once?

    In general, you can share such data between different scripts by:

    • Storing them in a configuration file and loading that in each script
    • Storing them in an environment variable (or several) and reading from that in each script
    • Storing them as data in a Perl module and using that in each script
    • (Not applicable here because catch-22) Storing them in a database/LDAP directory/other storage facility and reading from that in each script

    Note that here the module option is perhaps best because the module can handle not just shared data but also shared operations such as creating the database connection, handling errors/exceptions, serialising/encoding data, etc. Or you could combine two of them - have a module to handle all this and then that module could read the config file or environment variable, etc.

    how can I serialize the execution? such that subsequent scripts execute if and only if the first one executes successfully.

    You can do this in any number of ways, but I wouldn't. It is almost never a smart move to call a Perl script directly from another Perl script. Call it an antipattern if you like. Far better to take the second script and put all of its operations in a module instead. The second script then just becomes a wrapper around that module should you ever want to run it independently. The code in the module can be run from any script (or module) which uses it.

    Any general wisdom here?

    General rule: don't shell out from Perl to run Perl.


    🦛

      I think the OP and us may be way overthinking the OP's problem.
      first.pl and second.pl will create temp table schema and each will run a huge query that loads data into respective temp table schema. third.pl will join the two temp tables from first and second and load it into third.

      It could very well be that the OP just needs to log onto a network SQL DB and run just 3 SQL commands!?

      I am not sure at all there needs to be 4 Perl programs for this!

      It appears that a single Perl program of a couple of pages long will do what the OP is requesting. If not, then I want to know why not? (that's a question for the OP).

        To have some readability, as the queries are very long but you are right, it can be done with one program. Serial execution will remain a concern though - executing a program only after the previous step is completed successfully
Re: designing a program - your wisdom needed
by Discipulus (Canon) on Jan 20, 2022 at 08:09 UTC
    Hello SpaceCowboy,

    the first tool to code reuse are subroutines; see perlsub then if you can generalize a behaviour you can create your own module. As Corion++ already suggested you can have all sql statements in the main program and I'd use sub also for each steps: this help a lot having the main flow of the program short and clean.

    use strict; use warnings; use DBI; use Term::ReadKey; # use mode 2 'noecho' to ask for the connection pwd +, then revert to normal my $dbh = init_DB_connection; my $first_step_result = do_first_step; # https://metacpan.org/pod/DBI#do returns NRows, undef or -1 if ( !defined $first_step_result ){ ... } # report errors and exit elsif ( -1 == $first_step_result ){ ... } # perhaps this is a succe +ss in case of create table, dunno, execute second_step elsif ( $first_step_result > 0 ){ ... } # $first_step_result rows + affected else { ... } # got unexpected $first_s +tep_result sub init_DB_connection { ... } # return $dbh on success, exit if error +s sub do_first_step { ... } ...

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: designing a program - your wisdom needed
by cavac (Parson) on Jan 20, 2022 at 08:59 UTC

    Hmm, i can so you are only using do() to execute the SQL statements. So i'm assuming you are "blindly" executing stuff, without the need to read back data from the database. You might not need multiple programs or even subroutines to do this one.

    Let's assume you have a list of SQL statements you want executed. You can just put them into text files, one per line, then do something like this:

    #!/usr/bin/env perl use strict; use warnings; use DBI; use Carp; use English; my $dbh = DBI->connect('dburl', 'dbuser', 'dbpassword', {AutoCommit => + 0, RaiseError => 0}) or croak($EVAL_ERROR); foreach my $fname (qw[pdi pitchover contactlight engineshutdown]) { print "Executing file $fname\n"; open($my $ifh, '<', $fname) or croak($ERRNO); while((my $line = <$ifh>)) { chomp $line; # Remove linebreak character # Ignore empty lines and comments next if($line eq '' || $line =~ /^\#/); if(!$dbh->do($line)) { # It's a 1202 program alarm, abort the landing $dbh->rollback; croak($dbh->errstr); } } close $ifh; } # Tranquility base here, the Eagle has landed $dbh->commit;

    If you do more complex stuff that needs logic, you can split the program into perl modules. This would still use a single database handle, but allows you to do complicated interactive stuff. Let's start with eagly.pl, the main program:

    #!/usr/bin/env perl use strict; use warnings; use DBI; use Carp; use English; # Load the .pm files from the current directory. Not really recommende +d, but it's fine for testing and debugging BEGIN { unshift @INC, '.'; }; use PDI; use Pitchover; use ContactLight; use EngineShutdown; my $dbh = DBI->connect('dburl', 'dbuser', 'dbpassword', {AutoCommit => + 0, RaiseError => 0}) or croak($EVAL_ERROR); my @modules; push @modules, PDI->new(dbh => $dbh); push @modules, Pitchover->new(dbh => $dbh); push @modules, ContactLight->new(dbh => $dbh); push @modules, EngineShutdown->new(dbh => $dbh); foreach my $module (@modules) { if(!$module->executeProgram()) { $dbh->rollback; croak("Abort!"); } } # Tranquility base here, the Eagle has landed $dbh->commit;

    Ok, let's take a look at PDI.pm (each module would look similar, depending on what it does)

    package PDI; use strict; use warnings; use English; sub new { my ($proto, %config) = @_; my $class = ref($proto) || $proto; my $self = bless \%config, $class; return $self; } sub executeProgram { my ($self) = @_; # Slew to PDI attitude if(!$self->{dbh}->do("firststatement")) { return 0; } # Calculate burn time if(!$self->{dbh}->do("secondstatement")) { return 0; } # do some more complex stuff, like reading data from the database +and using that to # decide if we need to open the helium valves my $selsth = $self->{dbh}->prepare_cached("SELECT valvename, valve +state FROM descentstage") or return 0; if(!$selsth->execute) { return 0; } while((my $line = $selsth->fetchrow_hashref)) { # complex logic here } $selsth->finish; return 1; } 1;

    Hope that helps a bit in designing your program.

    perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'
Re: designing a program - your wisdom needed
by kcott (Archbishop) on Jan 20, 2022 at 12:36 UTC

    G'day SpaceCowboy,

    You have a lot of mistakes in your code. Here's a selection:

    • There is no dbi module on CPAN. You probably meant DBI; if not, provide a link to this dbi module.
    • Don't use smart quotes. You'll run into all sorts of problems; e.g.
      $ perl -MO=Deparse -e '$dbh->do("alter session set current_schema = th +e current schema”);' Can't find string terminator '"' anywhere before EOF at -e line 1. $ perl -MO=Deparse -e 'print “third script executed”;' Unrecognized character \xE2; marked by <-- HERE after print <-- HERE n +ear column 7 at -e line 1.
    • You used $dbh once; then left out the sigil in a multitude of places thereafter.
    • You have ttemp in one place; that should probably be temp.

    We all make typos but, with this many, you're quite likely to spend more time debugging than coding. Pay more attention to the code you're writing: don't make a rod for your back.

    "Any general wisdom here?"

    The following is intended to be very generic. There's not enough up-front information for me to provide working code. Adapt what I have here for your specific needs.

    I'd probably start with a front-end script that might look something like:

    #!/usr/bin/env perl use strict; use warnings; use SpaceCowboy::Database::Module; my $database_credentials = { ... }; my $sdm = SpaceCowboy::Database::Module::->new($database_credentials); $sdm->$_() for qw{primary first second third};

    I don't know what you had in mind for "***import database credentials here"; perhaps Getopt::Long, a config file, or something else. I'll leave how you populate $database_credentials up to you.

    In a separate file (.../SpaceCowboy/Database/Module.pm), code SpaceCowboy::Database::Module which does all the work. Here's a rough example of what that might look like:

    package SpaceCowboy::Database::Module; use 5.010; use strict; use warnings; use DBI; sub new { my ($class, $params) = @_; return bless $params => $class; } { my $DBH; sub dbh { my ($self) = @_; $DBH //= $self->_connect(); } } sub _connect { my ($self) = @_; # Use credentials in DBI->connect(...); # e.g. $self->host(), $self->username(), etc. return DBI->connect(...); } sub primary { my ($self) = @_; $self->dbh()->do("alter session ..."); return; } sub first { my ($self) = @_; $self->dbh()->do("create ..."); $self->dbh()->do("insert ..."); return; } # ditto for second() and third() 1;

    I've included a very basic new() method; however, you'll probably find one of the many frameworks available are easier to work with — Moo might be a good one to use here. See also: "perlintro: OO Perl" and perlootut.

    The database handle, $DBH, is lexically scoped such that only dbh() can access it directly. It is only ever set once.

    You'll need to look at the DBD::whatever documentation, for whatever database you intend to use, to find out the correct way to call DBI->connect(...).

    "how can I serialize the execution?"

    You could use flags for each of primary(), first(), etc. Set these to indicate which parts have completed successfully; do not rerun the flagged parts. You'll possibly need a reset() method, to turn these off, so that a fresh run from the start is possible.

    "Is it possible to write multiple statements like dbh->do("create temp schema", "insert table");"

    That may be dependent on the database you're using. You could write:

    $self->dbh()->do($_) for @statements;

    Be aware that might be problematic if you're also trying to flag the success of individual statements.

    Ask yourself why you think you'd need this. If you can't come up with a good reason, don't do it.

    — Ken

      Excellent response from accomplished Perl programmer kcott 👌🏻🛏️🫖 of Melbourne (who is currently kicking back with a beer at the Australian Open Tennis).

      SpaceCowboy 🌌🐄👦, though you're primary job is presumably herding cattle in outer space, your Perl seems to be at a similar level to Bod from (working-class) Coventry UK - so you should learn a lot from the many useful responses Bod received when he recently asked about Procedural vs OOP modules.

      Updated: added emojis from Emojis for Perl Monk names for kcott and SpaceCowboy

        G'day eyepopslikeamosquito,

        ➕➕ Many thanks for your high praise.

        I've actually just knocked off work for the week, so I'm kicking back with a beer at the Monastery.

        As it's my birthday tomorrow, I may have kicked back all the way to a horizontal orientation by midnight. Possibly herding galactic space goats across the firmament by that time

        👨🏻‍🚀 🐐 🪐

        or at least until

        🍺 🧟 🧠

        Avagoodweegend! 😈

        — Ken

Re: designing a program - your wisdom needed
by Marshall (Canon) on Jan 20, 2022 at 09:00 UTC
    As pme asked, what SQL DB are you using? Do you have a choice? If you have a choice, I would recommend SQLite. This is my "go to" DB and it doesn't require any installation or admin of a DB Server - the "care and feeding" of a DB server can get involved. With SQLite, there are no "accounts" or "passwords", if you have permissions to access to the DB file, you can use it. One of your hassles just goes away.

    I am working on an SQL application right now. While I am experimenting, my version of first.pl deletes the existing DB if it exists and starts completely anew. An alternative is the SQL: $dbh->do("DROP TABLE IF EXISTS main"); which I use in some other processing phases. This code is to show how easy it is to "get started" and create a brand new DB.

    use strict; use warnings; use DBI; $|=1; ################# ## Delete the old DB (if it exits) ## and create a brand new DB ## my $dbfilename = "MASTER.sqlite"; if (-e $dbfilename) { unlink $dbfilename or die "delete of existing $dbfilename failed $!"; } # This is the basic "open DB voodoo" my %attr = ( RaiseError => 1); #auto die with error printout my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfilename","","",\%attr) or die "Couldn't connect to database $dbfilename: " . DBI->errstr; ## Ready for Business now!! #################################### create_table($dbh); #RAW_DATA table my %yr2dir =( 2018 => "../some directory", 2019 => "../some directory", ); foreach my $year (sort keys %yr2dir) { $dbh->begin_work; print "working on $year in directory $yr2dir{$year}...\n"; populate_table($dbh,$year,$yr2dir{$year}); $dbh->commit; print "finished $year\n"; }
    Of course if you need to know, help can be provided about the CREATE TABLE statements and how to do the inserts. My code is not relevant to your application. My DB is a "small DB", only about 500K 1.3M records in the main table and takes about 30 seconds to completely "start over".

    I think a temp table would only persist while your DB connection that created it is open. I see no need to mess with that. Just make a regular table which will persist across any disconnect/reconnect. If you want to "clean up after yourself", then just DROP the table.

    So, How can I declare the database connection parameters just once? With SQLite, you just need the DB name. For my app, I just put the name "MASTER.sqlite" as a constant in all of the programs (and yes the actual name is bit more descriptive). I could make a config file with that data in it or even account/password info if I was using a different DB.

    I don't see why you would need to split the code up into 4 programs? I mean things like "third.pl will join the two temp tables from first and second and load it into third". That is one SQL statement! CREATE TABLE AS.... A separate program is a lot of overhead of one SQL statement. But the normal way to do this would be to have a shell batch file that runs each of your programs and then exits if X.pl returns a non-zero error code. Just like any other program, Perl will return 0 if program exits successfully, error code if an abend. You can force error exit with exit(476); or whatever.

    Update:

    I mentioned that your "create 3rd table from 2 other tables" is a single SQL statement.
    Here is an example where I create a table which is actually a much, much smaller set of a larger table and it is sorted in a way that I like it for easy viewing with my GUI tool.
    When you create a table from other tables, SQL already knows the data types, field lengths etc from the table(s) you are using as the "source". As I did here, you can rename the columns during that process. The SQL syntax for using 2 source tables is more complex, but this is a demo of from some actual working code:

    $dbh->do("DROP TABLE IF EXISTS sent_info"); #re-calculate this critter +! $dbh->do("CREATE TABLE sent_info AS SELECT year, sent_call AS call,sent_prec AS prec, sent_ck AS + ck, sent_sect AS sect FROM raw_qsos GROUP BY sent_call, year ORDER BY sent_call, year DESC");
    This is so fast that I haven't bothered to measure it.

    Update2:

    I can see situations where processing should be broken up into multiple phases. But I don't see the need that for the OP's question.

    In my current app, I am trying to figure out given your complete USPS address: street address (not a PO box), zip code and state, what state do you live in? It turns out that is a very complicated question to answer! Much to my surprise! With the DB's I have, I will be able to come close to a predicted state of residence, but there will be uncertainty in that prediction. Right now I have different "phases" of DB creation while I am developing so that I can see intermediate analysis of results while I am fine tuning the heuristics (rules of thumb). I don't think the OP is trying to answer any kind of question like "what state do you live in?".

      Thank you, I am using Oracle and do not have a choice. Interested to know more about the challenges you face with your state detection program. I'd imagine using a regex to match various inputs for the given state. may be there is an api that would obtain state from zip code? anyways thank you for your answer.
        This is off topic. But since you are curious, I'll tell you more. I have an address that I get from a government DB. A club has a competition and their "divisions" are defined by county and state boundaries. I wanted to use the address that I have to figure out which of the various "divisions" a person is actually living within.

        As it turns out. zip codes are for the convenience of the USPS. They can cross county and even state boundaries! I was shocked to find that out, but it is true. I was even more surprised when I found out that the state in your mailing address may not actually be where that particular street address is! You can have a Florida mailing address, but your house is actually over the border in Mississippi. Yep, a registered voter in MS can have a FL address! And they can't vote for the Governor of Florida!

        The USPS does have a DB listing a single county and a single state for each zip code. In this DB there could be (and is) a zip code that I found which lists a WV county but PA as the state. We don't know for sure, but this guy probably actually lives in WV even though his mailing address shows PA as the state. That same zip code can be used for addresses both in PA and in WV.

        Be that as it may, I decided to proceed further to see "how close" with some heuristics I could come to the "correct answer". I wound up doing pretty well on that. Meaning from an address, I can predict with high probability (not certainty), but high probability which one of this club's "divisions" this guy lives in. Then after much data crunching, I found a source of errors that dwarfs any uncertainty in my heuristics. Some of these guys have multiple residences and they may not actually live at their mailing address! So at this point, at least this small part of my project has fallen apart.

        Sometimes you have to crunch a bunch of data to find out what you don't know! By cross checking between other DB's, I can come up with a rough approximation of the accuracy of my code so far. At a cost of 2 orders of magnitude in complexity, I could improve upon my zip code DB. But that effort wouldn't matter because the address on file may be a 2nd home or Mom's or Dad's, etc.!

        I have more pieces of my puzzle to solve and more data crunching is going on...this zip code to county/state of residence was just one question of many. In the meantime, my SQL skills are improving.

        For development, I just run the Perl program from my programming text editor where the output is captured in another editor window. Some of these "pre-production" things spew out 50K lines of analysis so that I can see why it is making the decisions that it is. Then I tweak the Perl code to change a heuristic and run it again to apply human brain power to decide how well or not well it is working. At the same time I have my SQLite GUI up and running. So that I can look at output tables and run ad hoc SQL commands.

        At the end, I will create a couple of pseudo CSV files. I've been yelled at more than once for claiming that such things exist in the real world. They do. I will probably make a .bat file like I suggested for you in another post. That .bat file will run the correct sequence of "filters" to generate my final result. However, I may just continue to run each "phase" manually so that I can apply human judgement and "sanity determination". This thing is far from a "turn key end user" program (and it will never be).

Re: designing a program - your wisdom needed
by bliako (Monsignor) on Jan 20, 2022 at 18:58 UTC

    I would also go with what other Monks suggested: to use a sub for each of your proposed scripts' logic. One of their input parameters should be the DB handle. Once you have the three subs, you can still create 3 scripts, as per your original proposal, which reads in DB credentials, create a DB handle and call the appropriate sub. So you have both worlds (for whatever reason). (Edit: or call the three subs from one script which at the beginning asks or reads DB credentials once.)

    One point is unclear to me, you mention "join". Does it refer to SQL JOIN or to concatenate?

    Regarding your proposed "temporary table". It may not be necessary to create temporary tables if all you need is to pull results from DB, filter or concatenate and save. Perhaps you can save these results in Perl variables and do the transformations in Perl, but only if you are operating from a single Perl script calling the 3 subs (and data is small, though SQL can do it better in the DB, if one can write it that is ;) ). I don't know if this will be preferable than using the DB as temporary storage, it depends on your SQL knowledge and size of results to be transformed.

    Also, I have recently used Redis which is a nosql (temporary) data store, which more-or-less acts like a Perl Hash, but accessible from many programs within the same machine (or remote). I used it to share temp data between scripts (like your proposed 3 scripts) when I was too lazy to write SQL and do it within the DB. That data was like a string blob, JSON string etc. But Redis did not care about any structure. Which was very convenient in storing it and retrieving it, no questions asked regarding structure. If you find this idea interesting but you are not allowed to install Redis, here is a very lame implementation in pure Perl: Simple data-store with Perl . Note that you can store Perl data structures, e.g. nested hashes, into files or DB blobs by using Data::Serializer or Storable

      Thank you for your message. I meant an sql join. Like other monks and yourself have suggested, I will look at creating subroutines and get back to the thread. The tables are in few million rows.
      So, you are using Redis as staging? I belive Redis is memory hungry and may take a lot of physical memory.
      I was unable to understand to fully understand the "simple data store wth perl" link, what I gather is that it connects wtih a remote server and loops into some kind of regex expression? pardon my ignorance, I would love to learn your intention here and how this applies to relational tables.
        I meant an sql join.

        so, it's better to do that within the DB at least between scripts 1 and 2. (and to be clear on my above writing, doing things into DB is usually much better than retrieving data and transforming it in your Perl script *provided you write the correct SQL* - which I find almost impenetrable, that's why I am always looking for alternative, albeit roundabout and perhaps inefficient, ways. Bottomline: sticking with the DB is better, usually.)

        I belive Redis is memory hungry and may take a lot of physical memory.

        I have not noticed anything upnormal there. You can always try it out and see. It was quite fast for me. Can't say anything about memory usage.

        The "simple data store", I linked, acts as a server which clients (let's say your scripts) contact in order to either store a string (which can be some JSON or XML or with minor modification to be a binary serialised perl hash or array, possibly nested) by key, or retrieve a string by key. The regex you refer to implements the simplest API to do that. That is, it checks if client gave it something like KEY1=VALUE1, in which case it stores it. Alternatively, client can send a KEY1=, in which case it looks in its private hash-store if KEY1 exists and sends back to client the value stored. Warning: all checks for storing and retrieving data done in DBs is absent, e.g. handle race conditions, etc. Also, there is no encryption or password protection, all scripts can see all data provided they know the key.

        The "simple data store" does not apply to relational tables directly, it just offers a way for separate, independent scripts or programs (in various languages) to share some temporary data between them. It does not replace a database. It just offers a way to avoid using the DB as a temp data store. One example of use: script1 does data scraping and processing at irregular intervals. It places raw results in the data store. script2 run as a "CGI" checks if results exist in data store and converts them to HTML for viewing. For me script1 was in C, and script2 in Perl. It saved me a lot of trouble to do that (edit: ) instead of (end edit) with temp DB tables (ok, sqlite is a bit easier).

        That's my experience which is not industrial. Others here have industry experience.

Re: designing a program - your wisdom needed
by pme (Monsignor) on Jan 20, 2022 at 06:31 UTC
    What kind of SQL database do you use?
      Oracle