Re^5: Avoiding compound data in software and system design

EF Codd eh? Circa 1981, I had to do a CS project, and having read an article (in Byte I think) on Codd's paper, I wrote up the proposal for my project as: "A simple exploration of the Relational Model". To be written in BASIC Plus 2. And yes, BASIC.

I had one term to write it.

It took 6 weeks for the college library to obtain a photocopy of the paper--it had to come from the British Library in London, the only people in the UK who had a copy. It was photocopy, of a photocopy, of a bound paper with all the distortions and fuzzy greyness that entails. It took me two whole weeks to read it--I understood very little of it. So there I was with half my time gone and nothing to show for it.

Back to the point.

And that is, all DBI needs to know is the first two fields of the DSN. The first must match 'dbi' (+-case); the second must match a module "DBD::<2ndfield>" that is installed locally. What comes after that is none of its concern. It just gets passed through to the loaded DBD driver.

And the forms of that opaque token are myriad. A quick survey turns up:

  $dbh = DBI->connect("dbi:Informix:$database", $user, $pass, %attr);
 $dbh = DBI->connect("DBI:Unify:dbname[;options]" [, user [, auth [, a
+ttr]]]);
 $dbh = DBI->connect("dbi:Oracle:host=$host;sid=$sid", $user, $passwd)
+;
 $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","");
 $dbh = DBI->connect("DBI:drizzle:database=test;host=localhost", "joe"
+, "joe's password", {'RaiseError' => 1});
 $dbh = DBI->connect('dbi:ODBC:DSN', 'user', 'password');
 $dbh = DBI->connect("dbi:Pg:dbname=$dbname", '', '', {AutoCommit => 0
+});
 $dbh = DBI->connect('DBI:RAM:','usr','pwd',{RaiseError=>1});
 $dbh = DBI->connect("DBI:Wire10:host=$host", $user, $password, {Raise
+Error' => 1, 'AutoCommit' => 1}
 $dbh = DBI->connect("DBI:CSV:f_dir=/home/joe/csvdb")
 $dbh = DBI->connect("dbi:JDBC:hostname=$hostname;port=$port;url=$url"
+, $user, $password);
 $dbh = DBI->connect("dbi:Sqlflex:$database", $user, $pass, %attr);
 $dbh = DBI->connect("dbi:DB2:db_name", $username, $password);
 $dbh = DBI->connect("DBI:mysql:database=test");
 $dbh = DBI->connect('DBI:DBMaker:' . $database, $user, $pass);
 $dbh = DBI->connect('dbi:PgPP:dbname=$dbname', '', '');
 $dbh = DBI->connect('dbi:PgLite:dbname=file');
 $dbh = DBI->connect("dbi:ADO:Provider=Microsoft.Jet.OLEDB.4.0;Data So
+urce=C:\data\test.mdb", $usr, $pwd, $att )
 $dbh = DBI->connect("DBI:Ingres:dbname[;options]", user [, password],
+ \%attr);
 $dbh = DBI->connect('DBI:Solid:TCP/IP somewhere.com 1313', $user, $pa
+ss, 'Solid');
 $dbh = DBI->connect("dbi:Google:", $KEY);
[download]

Look at the variations once you get beyond the first two fields. Yes you could keep these all separate in a hash, but to what end? You (as a DBI user) cannot do anything useful with them because there is insufficient consistency to make even validation judgements, much less anything else.

Even where several DBDs require, for example, a "dbname", for some this will be have SQL identifier limitations--though even they aren't consistent across all SQL-like DBs.

For some it will be a filename (with local filesystem semantics--case dependance (or not); reserved characters (or not); length limitations (or not).

For some, it's a hostname and port.

For some--see the ADO example--it's a whole bunch of stuff entirely unique to that DBD.

For some the subfields have to be prefixed with their tagname, others are position dependant.

Why stick all these disparate bit into a hash and then have DBI concatenate the bits--risking getting it wrong because (for example) it adds tagnames where none are required, or the hash ordering screws up the position dependance; or ...?

To achieve all that, you'd need more than just a hash. You'd need one flag per field to decide whether the key name should be prepended to the fields value. You'd need another value to ensure ordering. You'd need yet another flag to ensure that (for example) backslashes in pathnames got escaped for interpolation.

And all of that complexity buys you what? The user can far more easily know what the requirements are for the DBD (or two; or three) he is going to use, than any programmer can try and unify into one generic interface structure that will stand the test of time.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP an inspiration; A true Folk's Guy

Comment on Re^5: Avoiding compound data in software and system design Download Code

Replies are listed 'Best First'.
Re^6: Avoiding compound data in software and system design by siracusa (Friar) on Apr 29, 2010 at 12:33 UTC
Why stick all these disparate bit into a hash and then have DBI concatenate the bits--risking getting it wrong because (for example) it adds tagnames where none are required, or the hash ordering screws up the position dependance; or ...? Why in the world would DBI stick all the bits back together? This information is not useful to anyone in this serialized form. It's entirely an artificial construct, dictated by DBI's (poor) decision to require multiple distinct pieces of information to be passed in a single string argument. The DBD has to parse it and break it back into pieces to do its work (e.g., extracting the hostname and port to make the system call to open a socket, extracting the database name to pass in the connection command, etc.) To achieve all that, you'd need more than just a hash. You'd need one flag per field to decide whether the key name should be prepended to the fields value. You'd need another value to ensure ordering. You'd need yet another flag to ensure that (for example) backslashes in pathnames got escaped for interpolation. No, you wouldn't, because the specially-formatted DSN string never needs to be constructed at all, for any reason.	[reply]
Re^7: Avoiding compound data in software and system design by BrowserUk (Patriarch) on Apr 29, 2010 at 17:04 UTC
No, you wouldn't, because the specially-formatted DSN string never needs to be constructed at all, for any reason. I see. So you're volunteering to go through and modify all the 600+ DBD::* modules; and all the modules that use them; and all the code written in the last 15 years that use them; just so that you can provide introspection of things that nobody will ever want to introspect? Let's just pretend for a moment that we could re-write history, and DBI had specified that the first parameter to DBI was a hashref. And (say), the only required pair was `dbi => Pg\|MySQL\|Whatever`. And that each DBD was free to require whatever pairs it needed. What does that achieve? You would have a hash rather than a string. That would make it easier to wrapover in your DSN object--though this parsing you speak of is hardly onerous. But then, as now, that wrapover is pointless. A hash is far easier to use than an object. `$hash{ $key }++` is infinitely preferable to `$dsn->keySet( $key, $dsn->keyGet( $key ) + 1 );` And apart from using clumsy OO syntax to get, set or iterate the contents of that hash, what else does that dns class do? What else could it do? You can't hope to validate all the possibilities. And there are no useful methods beyond get/set/iterate you could apply to it. Even ignoring that you'd: either have to standardize the fields in the dsn. Which is impractical as each DBD has its own unique set of requirements or eval the setters/getters into existance based upon what the user put in the hash. Which besides any problems with eval, means that DBDs would need to name all their fields. Which many own don't currently have, and neither need nor want to have; you would risk evaling the users typos into existance with absolutely no way to validate them. Ultimately, you'd have to provide a method that returned the attributes as a simple hash(ref) in order to pass it to DBI anyway. Unless you envisage passing your DSN object to DBI and have it pass it through to the DBDs (another breaking re-write). So then all the DBDs acquire yet another dependancy for no good purpose beyond the creation of OO tagliatelle. So, even if we could ignore history, and turn back the clock to do things your way, there'd be no value in it. And a considerable downside of increased complexity and dependancies. OO is fine and dandy when used properly, but using it to enforce a one-size-fits-all syntax fetish is no good use. Like all programming tools, the trick is to know when to use--and when not to. You may think that I don't agree with you, because I haven't tried the KoolAid yet, but you'd be wrong. I tried your flavour of KoolAid (along with most others) a long time ago, I just didn't like it. Or rather, can tolorate all flavours, each is fine for certain occasions, but I see no good reason to limit myself (or others) to just one flavour. Of anything. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]
Re^8: Avoiding compound data in software and system design by siracusa (Friar) on Apr 30, 2010 at 00:39 UTC
I see. So you're volunteering to go through and modify all the 600+ DBD::* modules; and all the modules that use them; and all the code written in the last 15 years that use them; just so that you can provide introspection of things that nobody will ever want to introspect? Is this really what you think I was suggesting? Or do you actually understand the context within which my previous post was made? Let's just pretend for a moment that we could re-write history, and DBI had specified that the first parameter to DBI was a hashref. Ah, it appears you do. I guess that first paragraph was just for "flavor," eh? You would have a hash rather than a string. That would make it easier to wrapover in your DSN object--though this parsing you speak of is hardly onerous. But then, as now, that wrapover is pointless. First, in my hypothetical version of DBI, I'd standardize the common names in the hash. For example, if you have something like a database, use the "database" key (not "db", not "dbname", etc.), use "host" for the host (not "hostname", not "server", etc.), and so on. This would be a documented part of DBI. Standardized names would cover at least half of the things commonly specified in DSNs. Second, I think you still don't understand what Rose::DB actually does. Providing a nicer interface to DSNs is not its primary, or even secondary purpose. Even if it simply passed the registered data source information straight through to DBI, its most important roles would remain unchanged: first and foremost, to provide a uniform interface for database-specific behaviors to Rose::DB::Object; and second, as a registry of logically named data sources, organized in a generic two-level hierarchy (domain and type) with an optional file-based configuration overlay to allow sensitive information such as production passwords to be kept out of the source code, while still using exactly the same code in development and production. To clarify that first role, the goal is to keep code out of Rose::DB::Object that says "if the database is mysql, do X, else if the database is oracle, do Y, else ..." and so on. (That goal is about 99% achieved; there are a few stragglers.) Instead, such behavior is delegated to the current Rose::DB-derived object, which is an object attribute in Rose::DB::Object (i.e., each Rose::DB::Object-derived object "has a" Rose::DB-derived object in its "db" attribute). This arrangements allows a Rose::DB::Object-derived object to, say, be loaded from one database and then stored into another, even if the two databases use a different server product, as shown in the tutorial. Assume "production" is PostgreSQL and "archive" is MySQL: `# My::DB "isa" Rose::DB $production_db = My::DB->new('production'); # PostgreSQL $archive_db = My::DB->new('archive'); # MySQL # Load bike from production database $p = Product->new(name => 'Bike', db => $production_db); $p->load; # Save the bike into the archive database $p->db($archive_db); $p->insert; # Delete the bike from the production database $p->db($production_db); $p->delete;` [download] Again, understand that the Rose::DB::Object-derived Product object calls back into its Rose::DB-derived My::DB object whenever it does anything that might vary from one database to another. I hope you'll agree that Rose::DB is just a bit more than a "DSN object." So, even if we could ignore history, and turn back the clock to do things your way, there'd be no value in it. There'd be considerable value in that people could use DBI without constantly having to run "perldoc DBD::Whatever" to look up the driver-specific DSN syntax that they don't remember off the top of their head. There's also the added efficiency of not having to parse a string inside the DBD (and not having to write string parsing code in C, as it often is today), and being able to add new attributes without worrying if there's "room" in the current DSN syntax or having to make a second or third syntax, as many DBD::* modules have done, to accomodate new/different parameters. Finally, since a serialized format does actually have its uses (e.g., when stored in a file or sent over a network connection), DBI could support one or more of the standard formats that can be used to serialize a Perl hash into a string: JSON, YAML, Data::Dumper, etc.	[reply] [d/l]
Re^9: Avoiding compound data in software and system design by BrowserUk (Patriarch) on Apr 30, 2010 at 06:52 UTC
Re^10: Avoiding compound data in software and system design by siracusa (Friar) on Apr 30, 2010 at 14:17 UTC
Some notes below your chosen depth have not been shown here
Re^6: Avoiding compound data in software and system design by Hue-Bond (Priest) on Apr 29, 2010 at 13:18 UTC
And that is, all DBI needs to know is the first two fields of the DSN. The first must match 'dbi' (+-case); the second must match a module "DBD::<2ndfield>" that is installed locally. What comes after that is none of its concern. It just gets passed through to the loaded DBD driver. A half-way alternative would be to specify the driver separated from the DBD-specific stuff. Then this: `$dbh = DBI->connect("dbi:Informix:$database", $user, $pass, %attr); $dbh = DBI->connect("DBI:Unify:dbname[;options]" [, user [, auth [, at +tr]]]); $dbh = DBI->connect("dbi:Oracle:host=$host;sid=$sid", $user, $passwd);` [download] would become this: `$dbh = DBI->connect(Informix => $database, $user, $pass, % +attr); $dbh = DBI->connect(Unify => "dbname[;options]", $user, $pass, % +attr); $dbh = DBI->connect(Oracle => "host=$host;sid=$sid", $user, $pass, % +attr); ...` [download] Please note that this is not an API change request/suggestion ;). -- David Serrano (Please treat my english text just like Perl code, i.e. feel free to notify me of any syntax, grammar, style and/or spelling errors. Thank you!).	[reply] [d/l] [select]