Re^3: Avoiding compound data in software and system design

You are confusing a complex of objects with compound data.

No I'm not. You are making an artificial separation where none exists.

Take urls. These are both complex and compound. And simple.

Whilst there are (many) modules like URI* that allow you to treat these as objects and access all their internal bits separately, the vast majority of modules that use urls as inputs (eg.LWP*), take them in their simple string form. Why?

Because they do not care what is inside, and do not want to have to deal with it. For most applications of those latter modules, the user will be supplying a 'simple string', picked out of a text file (log file; html; whatever), and all they need or want to know is, can I reach it?

If they had to tease apart the myriad forms of url/uri/urn formats in order to populate a ur* object in order to pass it to LWP*--that would promptly just stick all the bits back together again--it would be an entirely unnecessary waste of time & resources. Complexity without merit or benefit.

Same goes for file systems entities. We pass open a string, not some kind of FileSystem::Object. Because for the most part, they are simply an opaque scalar entity we use. Not pick apart and fret over.

And the same goes for your example of DBI data source names. At the DBI level, and below, they are simply opaque entities to be gathered and passed through uninspected. Requiring some kind of object be used for them would create unnecessary and useless complexity.

They do not even have a consistent constitution. Your example breaks them down as dbi

dbi mysql database host port
[download]

And then as

    __PACKAGE__->register_db( 
    driver   => 'pg', database => 'my_db',  host => 'localhost', usern
+ame => 'joeuser', password => 'mysecret',
    );
[download]

but you've lost two parts (dbi/port) and gained two parts (user/pass).

And then you get something like DBD::WMI, which doesn't need and cannot use most of those--either set of 5. And DBD::SQLite that also has no use for most of those fields. And these came into being long after the DBI/DBD interfaces were designed and implemented.

Rather than something to be "avoided", DBI's use of a string for the data source name is the sign of a well-though through, flexible interface. One that recognises that you cannot fit the world into labelled boxes, and that in many situations, there is no purpose in trying.

You should be celebrating the vision and skill of those authors for designing an interface so flexible it can accommodate future developments without requiring constant re-writes as time passes and uses evolve. Not decrying them.

Consider: Will your interfaces survive so long, so well?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

RIP an inspiration; A true Folk's Guy

Comment on Re^3: Avoiding compound data in software and system design Select or Download Code

Replies are listed 'Best First'.
Re^4: Avoiding compound data in software and system design by metaperl (Curate) on Apr 22, 2010 at 15:51 UTC
You are making an artificial separation where none exists. we will see about that (grin) But the distinction is simple: conceptual elements belong in separate data elements or in a single element with straightforward access. The DBI dsn string has several conceptual elements which are not in separate data elemnets. And access is not straightforward - had a hash reference been used, access would be more straightforward, with no loss in API quality. But like I said in the opening post of this thread: Typically people either know this and dont need to be told or they dont know it and dont care :) So it's almost like screaming at a wall. But your comments about URLs are well-taken. I thought about that this morning when I woke up. And in a sense, you could consider DSNs as a form of URL. In fact, SQLAlchemy uses URLs instead of DSNs Rather than something to be "avoided", DBI's use of a string for the data source name is the sign of a well-though through, flexible interface. One that recognises that you cannot fit the world into labelled boxes, and that in many situations, there is no purpose in trying. I dont agree: it requires more parsing to decide which DBD to dispatch to this way. You should be celebrating the vision and skill of those authors for designing an interface so flexible it can accommodate future developments without requiring constant re-writes as time passes and uses evolve. Not decrying them. `$dsn` as a hash reference would have been just as flexible and much finer grained. And it would not suffer from a case of compound data. And the code to decide which DBD to dispatch to would've been more succinct. And I would not have had to write DBIx::DBH in order to work with Rose::DB and DBI interchangeably. The Rose::DB API has finer granularity and does not suffer from the compound data issues that the DBI one does: connection info from Rose::DB can be converted into DBI connection info in a simple fashion, vice versa not so. The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development. -- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"	[reply] [d/l]
Re^5: Avoiding compound data in software and system design by BrowserUk (Patriarch) on Apr 22, 2010 at 23:45 UTC
Typically people either know this and dont need to be told or they dont know it and dont care :) So it's almost like screaming at a wall. You are deluding yourself. The DBI interface has been around for 15 years. And you are the first person to see this 'need'? And I would not have had to write DBIx::DBH in order to work with Rose::DB and DBI interchangeably. Have you looked inside Rose::DB? Have you looked at all the code and utterly pointless machinations it goes through in dealing with that hash in order to do what? To tack all the bits together into a string and pass it on to DBI! And what does it achieve? Nothing! Just a couple of hundred extra lines of code that complicate the interface and slow things down for no net gain whatsoever. Rose::Db is essentially a wrapover DBI. And you're writing a wrapover that wrapover so that you can "use them interchangably". Sir! Your logic is flawed. Even though you cannot see it. Your logic is flawed. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply]
Re^6: Avoiding compound data in software and system design by siracusa (Friar) on Apr 28, 2010 at 04:29 UTC
Have you looked at all the code and utterly pointless machinations it goes through in dealing with that hash in order to do what? To tack all the bits together into a string and pass it on to DBI! I think you're forgetting the rest of the story. DBI passes that DSN to the DBD, which then (wait for it…) parses it and breaks it back up into pieces again! When you see the whole picture, the "utterly pointless machinations" part is clearly in the middle, where DBI makes you smush up a bunch of separate pieces of information into one of a dozen different database-specific formats, only to have each DBD driver then separate the pieces again in order to do its actual work. (Well, to be fair, they're not utterly pointless machinations. They basically mark a point where the DBI API punted. Because it couldn't know ahead of time all possible information a DBD driver might want to have passed to it, DBI just shrugged and said "use an opaque string and each DBD can deal with the parsing." (There are other "arbitrarily extensible" data structures, of course. But no, string it was.) Anyway, there's another half of that bargain: the DBI user who has to form that specially formatted string. But hey, not DBI's problem, right? Not the DBI API's best moment, IMO.) And what does it achieve? Nothing! Just a couple of hundred extra lines of code that complicate the interface and slow things down for no net gain whatsoever. "No net gain" that you can see, anyway. But first things first. For the original poster, if you have a DBI DSN, you can feed it directly to Rose::DB and it'll attempt to do the parsing-into-components for you. Depending on how esoteric your DSN is, this may be sufficient; no extra code needed. As for the "gain" of entering and storing this information separately, isn't it obvious? No one wants to remember all the various ways to format DBI DSNs for each DBD::* module. Even within a DBD module there are often many different formats supported. Rose::DB provides an abstraction layer with uniform names and semantics. (Also, storing this information in pieces makes adding, removing, or editing individual components a lot easier and more obvious. It also provides a standardized syntax for the YAML configuration overlay file, yada yada.) This is perhaps the most common and most useful purpose of any piece of code: abstracting away details that don't matter. This example of DSNs is a microcosm of what Rose::DB does on behalf of Rose::DB::Object, namely isolating it from (or providing a uniform set of introspection methods to determine) the differences between the databases.	[reply]
Re^7: Avoiding compound data in software and system design by BrowserUk (Patriarch) on Apr 28, 2010 at 06:33 UTC
Re^8: Avoiding compound data in software and system design by siracusa (Friar) on Apr 28, 2010 at 12:41 UTC
Some notes below your chosen depth have not been shown here
Re^4: Avoiding compound data in software and system design by metaperl (Curate) on Apr 28, 2010 at 14:23 UTC
No I'm not. You are making an artificial separation where none exists. Everything a human does is 'artificial' - I think what you mean is superficial or arbitrary. And as this thread shows, even EF Codd was somewhat vague and arbitrary in specifying what constituted atomic data. So yes, you're right, the definitions are vague and somewhat subjective. But throwing some light and angst on the issue should make us more aware and intelligent in future API decisions. The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development. -- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"	[reply]
Re^5: Avoiding compound data in software and system design by BrowserUk (Patriarch) on Apr 28, 2010 at 17:06 UTC
EF Codd eh? Circa 1981, I had to do a CS project, and having read an article (in Byte I think) on Codd's paper, I wrote up the proposal for my project as: "A simple exploration of the Relational Model". To be written in BASIC Plus 2. And yes, BASIC. I had one term to write it. It took 6 weeks for the college library to obtain a photocopy of the paper--it had to come from the British Library in London, the only people in the UK who had a copy. It was photocopy, of a photocopy, of a bound paper with all the distortions and fuzzy greyness that entails. It took me two whole weeks to read it--I understood very little of it. So there I was with half my time gone and nothing to show for it. Back to the point. And that is, all DBI needs to know is the first two fields of the DSN. The first must match 'dbi' (+-case); the second must match a module "DBD::<2ndfield>" that is installed locally. What comes after that is none of its concern. It just gets passed through to the loaded DBD driver. And the forms of that opaque token are myriad. A quick survey turns up: $dbh = DBI->connect("dbi:Informix:$database", $user, $pass, %attr); $dbh = DBI->connect("DBI:Unify:dbname[;options]" [, user [, auth [, a +ttr]]]); $dbh = DBI->connect("dbi:Oracle:host=$host;sid=$sid", $user, $passwd) +; $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","",""); $dbh = DBI->connect("DBI:drizzle:database=test;host=localhost", "joe" +, "joe's password", {'RaiseError' => 1}); $dbh = DBI->connect('dbi:ODBC:DSN', 'user', 'password'); $dbh = DBI->connect("dbi:Pg:dbname=$dbname", '', '', {AutoCommit => 0 +}); $dbh = DBI->connect('DBI:RAM:','usr','pwd',{RaiseError=>1}); $dbh = DBI->connect("DBI:Wire10:host=$host", $user, $password, {Raise +Error' => 1, 'AutoCommit' => 1} $dbh = DBI->connect("DBI:CSV:f_dir=/home/joe/csvdb") $dbh = DBI->connect("dbi:JDBC:hostname=$hostname;port=$port;url=$url" +, $user, $password); $dbh = DBI->connect("dbi:Sqlflex:$database", $user, $pass, %attr); $dbh = DBI->connect("dbi:DB2:db_name", $username, $password); $dbh = DBI->connect("DBI:mysql:database=test"); $dbh = DBI->connect('DBI:DBMaker:' . $database, $user, $pass); $dbh = DBI->connect('dbi:PgPP:dbname=$dbname', '', ''); $dbh = DBI->connect('dbi:PgLite:dbname=file'); $dbh = DBI->connect("dbi:ADO:Provider=Microsoft.Jet.OLEDB.4.0;Data So +urce=C:\data\test.mdb", $usr, $pwd, $att ) $dbh = DBI->connect("DBI:Ingres:dbname[;options]", user [, password], + \%attr); $dbh = DBI->connect('DBI:Solid:TCP/IP somewhere.com 1313', $user, $pa +ss, 'Solid'); $dbh = DBI->connect("dbi:Google:", $KEY); [download] Look at the variations once you get beyond the first two fields. Yes you could keep these all separate in a hash, but to what end? You (as a DBI user) cannot do anything useful with them because there is insufficient consistency to make even validation judgements, much less anything else. Even where several DBDs require, for example, a "dbname", for some this will be have SQL identifier limitations--though even they aren't consistent across all SQL-like DBs. For some it will be a filename (with local filesystem semantics--case dependance (or not); reserved characters (or not); length limitations (or not). For some, it's a hostname and port. For some--see the ADO example--it's a whole bunch of stuff entirely unique to that DBD. For some the subfields have to be prefixed with their tagname, others are position dependant. Why stick all these disparate bit into a hash and then have DBI concatenate the bits--risking getting it wrong because (for example) it adds tagnames where none are required, or the hash ordering screws up the position dependance; or ...? To achieve all that, you'd need more than just a hash. You'd need one flag per field to decide whether the key name should be prepended to the fields value. You'd need another value to ensure ordering. You'd need yet another flag to ensure that (for example) backslashes in pathnames got escaped for interpolation. And all of that complexity buys you what? The user can far more easily know what the requirements are for the DBD (or two; or three) he is going to use, than any programmer can try and unify into one generic interface structure that will stand the test of time. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^6: Avoiding compound data in software and system design by siracusa (Friar) on Apr 29, 2010 at 12:33 UTC
Why stick all these disparate bit into a hash and then have DBI concatenate the bits--risking getting it wrong because (for example) it adds tagnames where none are required, or the hash ordering screws up the position dependance; or ...? Why in the world would DBI stick all the bits back together? This information is not useful to anyone in this serialized form. It's entirely an artificial construct, dictated by DBI's (poor) decision to require multiple distinct pieces of information to be passed in a single string argument. The DBD has to parse it and break it back into pieces to do its work (e.g., extracting the hostname and port to make the system call to open a socket, extracting the database name to pass in the connection command, etc.) To achieve all that, you'd need more than just a hash. You'd need one flag per field to decide whether the key name should be prepended to the fields value. You'd need another value to ensure ordering. You'd need yet another flag to ensure that (for example) backslashes in pathnames got escaped for interpolation. No, you wouldn't, because the specially-formatted DSN string never needs to be constructed at all, for any reason.	[reply]
Re^7: Avoiding compound data in software and system design by BrowserUk (Patriarch) on Apr 29, 2010 at 17:04 UTC
Re^8: Avoiding compound data in software and system design by siracusa (Friar) on Apr 30, 2010 at 00:39 UTC
Some notes below your chosen depth have not been shown here
Re^6: Avoiding compound data in software and system design by Hue-Bond (Priest) on Apr 29, 2010 at 13:18 UTC
And that is, all DBI needs to know is the first two fields of the DSN. The first must match 'dbi' (+-case); the second must match a module "DBD::<2ndfield>" that is installed locally. What comes after that is none of its concern. It just gets passed through to the loaded DBD driver. A half-way alternative would be to specify the driver separated from the DBD-specific stuff. Then this: `$dbh = DBI->connect("dbi:Informix:$database", $user, $pass, %attr); $dbh = DBI->connect("DBI:Unify:dbname[;options]" [, user [, auth [, at +tr]]]); $dbh = DBI->connect("dbi:Oracle:host=$host;sid=$sid", $user, $passwd);` [download] would become this: `$dbh = DBI->connect(Informix => $database, $user, $pass, % +attr); $dbh = DBI->connect(Unify => "dbname[;options]", $user, $pass, % +attr); $dbh = DBI->connect(Oracle => "host=$host;sid=$sid", $user, $pass, % +attr); ...` [download] Please note that this is not an API change request/suggestion ;). -- David Serrano (Please treat my english text just like Perl code, i.e. feel free to notify me of any syntax, grammar, style and/or spelling errors. Thank you!).	[reply] [d/l] [select]