Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

MySQL Table Creation using DBI

by blink (Scribe)
on Sep 04, 2002 at 21:45 UTC ( [id://195221]=perlquestion: print w/replies, xml ) Need Help??

blink has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I've started a project which is more of a proof of concept than anything. It involves pushing stats from a backup server into a MySQL database. The database will then be used to create certain reports.

I've settled on a schema that make sense and I am ready to begin creating the database. This is my first stab at any kind of databse programming, so my question might seem a bit newby-ish.

My question is, is there a way that I can programmatically create my tables from hash keys? I hope so, since doing this manually would take several hours!

Here's example content of one of the hashes that I will use:

$VAR435 = 273118; $VAR436 = { 'try_1_server' => 'netbackup2', 'jobtype' => 'immediate', 'try' => 1, 'group' => '', 'subtype' => 1, 'try_1_ended' => '1031010812', 'try_1_statuscount' => 12, 'filelistcount' => 1, 'status' => '0', 'try_1_stunit' => 'netbackup2_stu', 'try_1_statusdescription' => 'the requested operation was + successfully completed', 'schedule_type' => 4, 'percent' => '', 'trycount' => 1, 'try_1_kbyteswritten' => 1, 'operation' => '', 'retentionunits' => 3, 'files' => 'D:\\', 'try_1_started' => '1031010638', 'try_1_elapsed' => 174, 'masterserver' => 'netbackup1', 'jobid' => 273118, 'try_1_pid' => 13032, 'kbyteslastwritten' => '0', 'path' => '', 'fileslastwritten' => '0', 'try_1_status' => '0', 'client' => 'esmem', 'server' => 'netbackup2', 'jobpid' => 13032, 'started' => '1031008015', 'class' => 'NT-Main', 'elapsed' => 2797, 'stunit' => 'netbackup2_stu', 'state' => 'done', 'retentionperiod' => 1, 'classtype' => 13, 'try_1_statuslines' => '09/02/2002 19:50:40 - connecting 09/02/2002 19:50:40 - connected; connect time: 000:00:00 09/02/2002 19:50:40 - mounting CC0530 09/02/2002 19:50:40 - mounted; mount time: 000:00:00 09/02/2002 19:50:40 - positioning to file 138 09/02/2002 19:50:40 - positioned; position time: 000:00:00 09/02/2002 19:50:40 - begin writing 09/02/2002 19:50:40 - positioning to file 139 09/02/2002 19:50:40 - positioned; position time: 000:00:00 09/02/2002 19:51:30 - positioning to file 140 09/02/2002 19:51:30 - positioned; position time: 000:00:00 09/02/2002 19:53:20 - end writing; write time: 000:02:48', 'ended' => '1031010812', 'schedule' => 'NT-Differentials', 'try_1_fileswritten' => '0', 'kbytes' => 1, 'owner' => 'root', 'compression' => '0', 'priority' => 10 };
In this case, the outer most hash key would be the primary key, named 'jobid', the inner most keys would be column names, and the inner values would be that data to be pushed into each table row.

The other problem I just realized as I'm typing this, is that if you notice that there are several entries which refer to "no. of tries". There could be as many as 5 tries per jobid. I suppose that could simply create 5 'try_n' tables, all using 'jobid' as their primary key, no?

Again, this is a first go around for me, so if I've left any pertinent info out or I'm not being clear, please ask for clarification

Edit kudra, 2002-09-05 Added a readmore tag

Replies are listed 'Best First'.
Re: MySQL Table Creation using DBI
by adrianh (Chancellor) on Sep 04, 2002 at 22:24 UTC

    If you're of an object-oriented attitude then I'd take a look at Class::DBI, which allows you to quickly create an object based interface to tables. I find it very useful and a huge timesaver.

    As for the "no. of tries" problem - having 5 tables, one for each try, probably isn't the best solution (what happens when somebody decides you need 4 or 6 tries? what happens when you need to add a little bit more info on each try?)

    Instead have a table with a primary key of "jobid" & "try number" so you would have something like this:

    jobid try server stunit ... etc ... 1 1 foo bar 1 2 fribble baz 2 1 feep babble ... and so on ...
    This way you're only ever messing with a single table. Makes your code (and future maintainence) much easier.

      The structure of the tables is a lot more dependant on how they are used. A single table may or may not be the way to go, depending on the characteristics of your queries, and of the tries. For instance: If you need to query all databases to find something other than the primary key often, a one table structure is probably the best; however, if all your queries, after the insert assume that the lookup row exists, and will only look for values from that, a table for each try would probably be best. When your database gets big, and you look for the 'ended' value for try three of jobpid 200, and you know jobpid 200 exists in the try 3 table, the query may be a great deal faster.

      On the other hand, if you want to only retrieve jobs with more than three tries for your error logs, then a single database with a'WHERE try = 4' clause.

      But what if tries 3 and 4 never have some trait. Do you want to leave it blank in your single table?

      I would recommend making the decision of database structure on my own or putting out as much code, and as many facts as possible. Database structure is always a touchy subject, when getting developmental suggestions, because as adrianh points out, a single table is much more flexible; however, you must decide how much flexibility you are willing to sacrifice for speed. You may want to read up on Database Normalization 1 2, before you decide what the structure of you database will be. While youdevelop your 'project'(of which we know very little, and therefore our suggestions may be completely wrong), adrianh's suggestion of a single table is the right one.
      I could be completely wrong
      Gyan Kapur
      gyan.kapur@rhhllp.com

        Actually for the table format I suggested doing

        look for the 'ended' value for try three of jobpid 200

        would probably be faster than doing

        WHERE try = 4

        since the former is a primary key lookup (jobid & try) of the new table - where the latter might well have to do a full table scan. If you are doing queries like this a separate table for each try might be faster (although many databases would allow you to build a separate index on the try #).

        :-)

        That said I totally agree with Gyan's comments on the fact that DB design is not trivial. You know more about your data, and how it will be queried, so having a single table may not be the best solution.

      Do like adrianh says and just insert rows into a table instead of creating a new table every time. What would you do when you have hundreds of tables? Would you join a hundred tables together to build your reports? With a single table it would just be a single query. What would you name your tables when you have that many? Are you going to write a maintenence routine to cleanup the database after creating a dropping hundreds of tables on a daily basis? With a single table it's the database's responsibility to clean the table's storage space. (actually, this is also a concern, but not as drastic as the impact of dropping and creating tables, depending on the database you're using.)
Re: MySQL Table Creation using DBI
by ignatz (Vicar) on Sep 05, 2002 at 00:21 UTC
    I don't really see a need to create some complex whirlygig to replace simply writing out the SQL to create the tables. One vital piece of information that is missing from a hash's key value pair is the data type for each column. If this is your first stab at working with a database, I would keep it as simple as possible.
    ()-()
     \"/
      `                                                     
    

      I've heard Class::DBI described as many things before... but never "complex whirlygig" :-)

      (SPOPS or Alzabo on the other hand ;-)

      update: above removed because I was foolish and misread the message hierarchy. Bad Adrian.

      For example, to create the rows (assuming the DB table exists) you would just need something like (untested code):

      #! /usr/bin/perl use strict; use warnings; package BackupJob; use base 'Class::DBI'; use CLASS; # the DB where you job table is CLASS->set_db('Main', 'dbi:mysql', 'username', 'password'); # the name of your job table CLASS->table('jobs'); # the columns in your table, primary key first CLASS->columns(All => qw( jobid class classtype client compression elapsed ended filelistcount files fileslastwritten group jobid jobpid jobtype kbytes kbyteslastwritten masterserver operation owner path percent priority retentionperiod retentionunits schedule schedule_type server started state status stunit subtype try try_1_elapsed try_1_ended try_1_fileswritten try_1_kbyteswritten try_1_pid try_1_server try_1_started try_1_status try_1_statuscoun +t try_1_statusdescription try_1_statuslines try_1_stunit trycount )); package main; # If we assume %JOBS us a has mapping # jobid => {column => value, column => value ...} # then the following will create a DB row for each # entry my ($jobid, $values); while ( ($jobid, $values) = each %JOBS ) { BackupJob->create({ jobid => $jobid, %$values }) or die };

      The BackupJob object also gives you lots of other nice things like searching, accessors, etc for free. Nice Class::DBI :-)

      (of course, one of the things that Class::DBI doesn't support is tables with multi-column primary keys - so its not a lot of use for the table example I gave earlier :-)

        I was not replying to your post.

        If the poster is new to working with databases, then the best thing to do is to get comfortable with the most straitforward way of doing things, which is with SQL. Not a good idea to abstract out fundamental tools if you don't know how the fundamental tools work.

        ()-()
         \"/
          `                                                     
        
Re: MySQL Table Creation using DBI
by richardX (Pilgrim) on Sep 05, 2002 at 01:52 UTC
    I agree with the suggestion to keep it simple. When you are ready to tackle a more complex but very flexible method of handling many tables and databases, think about using Metadata. I have used Metadata tables many times to save my bacon when the table structures changed a lot or there were 100’s of tables and a 1000 columns total. Metadata means data describing data. I will give you an example of how Metadata can help you create, drop, modify columns, write SQL queries, basically the whole shooting match. This is a dynamic way to generating the SQL to drive the entire process.

    Create a few Metadata tables that are populated by Class::DBI or by other methods. Then write programs that read the Metadata and create SQL scripts on the fly. Pass parameters to this Generator so that you can generate CREATE, DROP, etc scripts or SQL queries and for specific databases or tables. Once you have this Generator program written, you only have to populate the Metadata tables to create new tables or make modification. This Metadata system is then reusable for other database projects and you can spend your time enhancing the Generator instead of writing scripts and debugging syntax.

    META_MAST is the master table that contains the definition of each column. META_KYS is the table that contains the list of keys. META_REL is the table that contains relationships between tables.

    First create a table called META_MAST, with the following columns:

  • Id INTEGER PRIMARY KEY,
  • db_name VARCHAR(50) Database name
  • tbl_name VARCHAR(50) Table name
  • col_name VARCHAR(50) Column name
  • dtype CHAR(1) Data type: V:Varchar I:Integer C:Char
  • len INTEGER Length of the column
  • ky_col CHAR(1) Is this the primary key? Y/N
  • notes VARCHAR(255) Notes about the column
  • Create a table called META_KYS:

  • Id INTEGER PRIMARY KEY,
  • ky_exp VARCHAR(255) Key expression
  • Create a table called META_REL:

  • Id INTEGER PRIMARY KEY,
  • tbl_rel VARCHAR(255) Table relationships
  • Richard

    There are three types of people in this world, those that can count and those that cannot. Anon

      I think I see the point and flexibility in the general theory of the scheme you describe, but I am having trouble understanding how the META_KYS and META_REL are used. Perhaps it is just too early and I am too sick (allergies, not mental) to grasp this, but is there a page or some site that describes a bit further how this is implemented?

      the_Don
      ...making offers others can't rufuse.

        Sorry Don, my fingers got tired before I got to finish all my thoughts on the idea. The META_REL and META_KYS are for storing the list of columns that make up the multiple keys or relationships that tables have. For example in the ky_exp column could be populated with “cust.cust_lastname+cust.cust_firstname“. When it came time to create the keys for the CUST table of customers, the Generator could read this row and write the proper CREATE statement to a file like createcust.sql.

        You would also need to add the foreign key from META_MAST to each of these tables along with a column for the name of the database and table name.

        My ideas and techniques do not come from one place or one project but from a myriad of projects that used a wide variety of Data Warehousing, CASE, and Data Dictionary tools. I would suggest looking in Google for the keywords metadata, SQL, and database.

        Most of the commercial tools or homegrown tools that I have used, had a database backend. The project that I remember the best that this idea saved me on was a data conversion project, converting Mainframe (MF) data to SQL tables. The MF data to start with was “dirty” and had to be verified using various pieces of code snippets, looking for invalid data, corrupt data, orphan rows, and the like. The next problem was the MF data used data types that SQL did not have, like packed decimal. So I had to write another snippet to decode the packed decimal. Then the data had to be inserted into SQL tables. Sometimes this was a one for one basis, which was a piece of cake. But most of the MF tables had to be split into several tables and populated in a relational way into the SQL. If I chose to do this project by hand coding, I would still be working on it.

        To make matters worse, the MF programmers were constantly changing the schema, and I was last to know. The icing on the cake was that we were dealing with 200 tables from the MF which created 350 SQL tables.

        So I created a metadata Generator like the one I described before and stored all the attributes of the MF input data and the SQL output data in my metadata database. When the MF data came to me in large tapes with different schemas, no sweat, I merely remapped it in my schema description tables. Sometimes the SQL programmers came to me with ever changing SQL schemas. Again I just changed the SQL schema description tables. I extensively used TEMP tables and ad-hoc tables for key lookups and temporary data storage until the final COMMIT of each transaction.

        With this system I could rapidly adapt to any curveballs thrown to me, and I could iteratively output SQL tables so that the SQL programmers could have some test data, and then feedback to me the schema changes on the SQL side. If any programmer added a table, it took me about five minutes to update the metadata and I was ready to crank out the data.

        Some of the side benefits from using this metadata Generator were documentation, statistics, and accuracy. I wrote several SQL reports that queried the metadata, giving a complete map of the MF tables coming in and where they mapped to in the SQL. Sometimes the programmers used this documentation to find errors in their data structures. I wrote several SQL reports that generated statistics of rows inputted, outputted, errors found and the type of errors, corrections made by the program, etc. Finally, the overall accuracy of my data conversion project was better by far than a set of manually written scripts, and this improvement in productivity was provable to management.

        Richard

        There are three types of people in this world, those that can count and those that cannot. Anon

Re: MySQL Table Creation using DBI
by darkphorm (Beadle) on Sep 05, 2002 at 07:53 UTC
    I've done a lot of code like this...

    Requires a lot of loops dereferencing the hashrefs

    foreach $key(keys %hash) #normal hash
    {
    value=$hash{$key}
    }

    foreach $key(keys %$hashref) #hashref
    {
    value=$$hashref{$key}
    }

    You can drop down the chain by uses hashes to reference your hashrefs, and instantiating new loops.

    foreach $key(keys %$hashref)
    {
    $hashrefref=$$hashref;
    foreach $key(keys $$hashrefref)
    {
    etc etc
    }
    }

    I may be a bit off (it's 1am) , it was something like this though
Re: MySQL Table Creation using DBI
by thunders (Priest) on Sep 05, 2002 at 15:22 UTC
    As for your question about number of tries. It's hard to say for sure as i dont know what kind of data you get when there are multiple tries, but i would reccommend using an array of hashrefs instead of searching for "try_".$x."_foo"

    here is part of such a astructure(assumimg that some info will be the same for all tries)

    $VAR435 = 273118; $VAR436 = { jobtype => 'immediate', try => 1, group => '', subtype => 1, filelistcount => 1, status => '0', tries=> [ { server => 'netbackup2', ended => '1031010812', statuscount => 12, stunit => 'netbackup2_stu' }, { server => 'netbackup3', ended => '1031010813', statuscount => 13, stunit => 'netbackup3_stu' } ] }
    In this case you'd be able to loop through the hashes in @{$data{273118}{tries}} to make your tables. In many cases a good place to start optimizing and simplifying is in the code that generates your datastructure.
Re: MySQL Table Creation using DBI
by PhiRatE (Monk) on Sep 06, 2002 at 00:44 UTC
    You need to read about database normalisation.

    Essentially, what you have here is not a large complicated data structure containing many fields, but a very simple data structure containing very few, as so:

    table stats ( jobid INT, try INT, item VARCHAR(255), value VARCHAR(255), # or maybe TEXT or equivalent PRIMARY KEY (jobid, try) );

    Or similar. You would then be able to retrieve all the contents of the hash by:

    select item, value from stats where jobid=273118 and try=1

    It also allows you to get nice aggregate stats as so:

    select owner, count(*) as n from stats where try=1 group by owner orde +r by n
    Which would give you a nicely sorted list of all the different owners in all the jobs, and a count of how many jobs they own.

    Even better, its trivial to code for to load and unload from the hash..

    foreach (keys %hash) { $dbi->do('insert into stats (jobid,try,item,value) (?,?,?,?)',$job +id, $try, $_, $hash{$_}); }
    etc.

    Read a good book on SQL, or if you alreayd know SQL decently, on database normali|sz|ation and what it means for your data structures.

    Create them well, and the database will end up doing all the work for you.

Re: MySQL Table Creation using DBI
by rir (Vicar) on Sep 06, 2002 at 01:07 UTC
    Code generation is a good tool, forces generalization
    thoughts early on. Can also be an efficient way to
    create code.

    Others have mentioned the need of meta-data to do so.

    Lacking experience it is likely that you need to
    hand write a table definition, or two, just to prepare
    yourself to create the meta-data.

    If you only have this one data structure it probably
    is easiest to just hand write your table definitions.

    If you do decide to use meta-data to create your code,
    don't make the error of skimping on it. The more your
    meta-data tells you about your data the more capable your
    generation scheme will be.

    _Advanced_Perl_Programming_, by Srinivasan, from
    O'Reilly, has a very accessable introduction to code
    generation. Note that the tool described there is not
    completely implemented as described.

    If you do decide to generate your code, I would suggest
    that you use a simple config file or such for your
    meta-data. Avoids the difficulties of fetching it from
    a database.

Re: MySQL Table Creation using DBI
by blink (Scribe) on Sep 06, 2002 at 02:03 UTC
    WOW! Didn't expect so many great replies! You gals/guys rock!
Re: MySQL Table Creation using DBI
by mjmilan (Initiate) on Sep 07, 2002 at 11:45 UTC
    Can you create tables from hash keys? Well, I'm new to Perl, but I don't see a reason why SQL's CREATE / ALTER statements wouldn't work. Only problem is you'll need to know which datatype to use for each field. mjmilan
      I was trying to do something similar; creating a table with the column names and corresponding types that the user input (ie I assume that they know enough SQL to know what the column types should be). I found that it was easier (though arguably less elegant) to just store the column names and types in a scalar variable with the pairs just seperated by commas. You can then plug this variable straight into the SQL CREATE TABLE command.
      #Get the input from the user, I have this in a subroutine which is cal +led repeatedly until the user enters 'done' print "Please enter the name of the column, when you are finished type + 'done':\n"; chomp(my $column_name=<>); print "Please enter the column type:\n"; chomp (my $column_type=<>); if (defined $fields) { $fields = "$fields, $column_name $column_type"; } else { $fields = "$column_name $column_type"; } #Otherwise $fields begins with ', ' which is not what we want my $create_table = "CREATE TABLE $table_name ($fields)"; #Put this in a variable so that it is all interpolated before being pa +ssed to the SQL my $sth=$dbh->do($create_table) or die "Could not prepare table $check +table: $DBI::errstr\n"; print "Table $table_name created\n"
      This method is somewhat adapted from this article. Hope it makes sense =)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://195221]
Approved by krisahoch
Front-paged by krisahoch
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-04-19 04:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found