Re^2: Joint Database Technology

MAIN/ROOT THREAD: http://www.perlmonks.org/index.pl?node_id=1121222

The advantages I see to this Joint DB Technology are many fold.

The problem with Flat File Databases (historically) has been that indexing to records was cumbersome/slow/a pain. Typically, a flat file would be read in its entirety (first to last record) each time an application program was launched where random access to the records was needed. As the file is being read in, a hash table is built in memory to hold the record offsets for random access while the current program is in use. This does nothing for your nerves, not does it lend itself to a concurrent multi-user environment.

But with a persistent tie to an external Perl SDBM database containing the record offsets, random access to flat file records is immediately available, and I mean FAST!

And you can use many Perl SDBM files as indexes to the same Flat File. Also, your key in the key/value pairs does not have to be of the same format. In other words, one(1) SDBM file can be used to hold many different indexes.

I use the "key" to store the flat file fields I want used as indexes. I use the "value" to store the record offset (in bytes). The "key" in the key/value pair, can be one(1) or more fields and/or partial fields (perhaps delimited by a "|"). The "value" can be one(1) or more fields of data separated by a delimiting character such as a comma or pipe "|". An ALT KEY with DUPS can be managed too. Let's say we have a PRIMARY KEY on Social Security Nbr. Another useful KEY (ALT KEY w/DUPS) is shown in the example below.

     #--                 YYYYMMDD  
     #-- Key example:    BirthDate|LastNameFirst4Chars|FirstNameInitia
+l|StateCode
     #--                            "19591219|Will|K|TX"
     #-- $KEY without a Seq Nbr is used to increment the number of rec
+ords saved to the database
     #-- having a particular ALT KEY w/DUPS - in this example:   "1959
+1219|Will|K|TX"

     $KEY=$BirthDate . "|" . $LastNameFirst4Chars . "|" . $FirstNameIn
+itial . "|" . $StateCode;
     $Hash{$KEY}=0;

     #-- Now index the first record encountered in the Flat File datab
+ase with this particular ALT KEY w/DUPS 
     $num_recs = $Hash{$KEY};
     $num_recs++;   #-- i.e. one(1)
     $Hash{$KEY}=$num_recs;
     $newKEY=$KEY . "|" . $num_recs;
     #-- produces:   "19591219|Will|K|TX|1"
     $Hash{$newKEY}=   #-- The VALUE would be set to the byte offset o
+f the Flat File record just indexed 

     #-- Now index the second record encountered in the Flat File data
+base with this particular ALT KEY w/DUPS 
     $num_recs = $Hash{$KEY};
     $num_recs++;   #-- i.e. two(2)
     $Hash{$KEY}=$num_recs;
     $newKEY=$KEY . "|" . $num_recs;
     #-- produces:   "19591219|Will|K|TX|2"
     $Hash{$newKEY}=   #-- The VALUE would be set to the byte offset o
+f the Flat File record just indexed 

     #-- and so on...
[download]

ALSO, SDBM is very very easy to use and has a 2 GIG storage abililty. Millions of record offsets can be stored for random access indexing to fixed-length file records. And you can use as many SDBM files as you need. But know this: that you can store multiple indexes (i.e. Key formats) within the same SDBM file. Also know this: That you don't have to store the byte offset to every record in your database if the database records are logically/physically sorted/grouped. What you can do is store the byte offset of the first record in a related/sorted group of records, random access the first record in the group, then sequentially access the remaining records in that group. Or, you can read into memory all the records within a group with just one(1) read statement execution (dozens of records to perhaps hundreds or a few thousand records in one READ). Then, using the UNPACK statement, you can break out each record (from the massive READ) into individual record array elements for further processing.

Both Flat Files and SDBM are FREE/NO COST TECHNOLOGY that can be freely distributed. SDBM comes automatically with every distribution of Perl.

SDBM is difficult to use by itself for database storage because of limitations on record size (something like 1000 bytes key/value pair maximum length combined). Flat Files can store huge records if needed, like COBOL files. Great for storing Bitmap images converted to inline text (See Aldo Calpini's Win32::GUI module for the INLINE BITMAP conversion utility).

Flat files are text based. Not much can go wrong with them, whereas Perl SDBM files I believe are Binary. Well, they might get corrupted, so not as good an idea for storing your important data as it might be impossible to recover them. But when used only as index files, they can be rebuilt from the Flat File "text" data.

I like to use Perl's formula-based MIMEbase64 encoding/decoding of stored data to make it humanly unreadable. My batch applications and user-interfaces do the conversion back to readable text on the fly. You can employ home-grown word-to-code encrypting/compression of your data using SDBM to store the key/value pairs for translation back from code to word. A link is provided later on in this thread by a contributor who found my posting at another site where I provide Perl code and the steps to employ encryption/compression with Perl SDBM, and where the word/code mappings are generated completely at random every time.

Flat file databases and SDBM databases are easy to deploy and are portable. Right? And they are standalone. They do not rely on any other files such as Data Access Components (Drivers/Engines), DSNs, or any other gobble-d-goop that could become stepped on and must be current on any workstation on the Network. And, this database tandem system could be hosted on a Network Drive with the proper permissions for access to prevent copying and deleting.

Finally, I think this could be considered a RDBMS? You could have a normalized flat file system of files and perform lookups to mapping tables (e.g. multiple instances of part_numbers within Flat File records mapped to a single instance of the corresponding part_descriptions within lookup tables) and even have one-to-many parent/child record relationships between Flat Files, using Perl SDBM indexing. Note: Always store the foreign key within the Child records to maintain the link between Parent and Children so that the SDBM indexing can be rebuilt from the Flat File records data.

Someone brought up BITMAPS. Store them in the database, or store the filename in the database. I think I could store the BITMAP in a flat text file okay using an INLINE BITMAP utility that converts the Bitmap to a stream of text. I have used this before to hard code Bitmaps into my Windows GUI Perl Applications. The utility comes with Aldo Calpini's Win32::GUI Module. It can be ran as a command line utility or embedded within your Perl application program to display stored inline "text" bitmaps as true images (on the fly).

Regarding multi-user concurrent access and data capacity, I have not tried the multi-user concurrency yet with NoSQL/ISAM Flat File/SDBM databases (lacking the opportunity). But I have loaded 5 million records of King James Version Bible verses of text to a single Flat File with Perl SDBM/tied Hash Table indexing to those 5 Million records. The random access lookup times to any verse were instantaneous. As fast as a Microsoft Jet (Red) Engine ODBC-enabled SQL Database of similar use. With Jet (Red), the engine used by MS-Access, I have loaded (via Dave Roth's: Perl Win32::ODBC module) 10 Million records to one *.MDB file using Jet 4.x format (*.mdb files) with compression and encryption turned on. Unfortunately, Win32::ODBC is no longer supported in the latest Perl distributions. Anyway, I am getting away from SQL and DAC/MDAC altogether, opting for NoSQL/ISAM/Embedded FlatFile/SDBM database systems.

Replies are listed 'Best First'.
Re^3: Joint Database Technology by jeffa (Bishop) on Mar 24, 2015 at 22:37 UTC
I do not see any advantages to what your propose over what an RDBMS offers. There is more to the story than just storing and retrieving data. $Clients want the data pivoted into all arrangements of reports. Standards such as PCI compliance shun using Personally Identifiable Information as non-encrypted keys. And then you have technologies such as Hadoop can perform map/reduce algorithms on flat files that span across multiple machines. What you have sounds cool, but not at the expense of ignoring current solutions that take a lot more into consideration than what you have here. What would sound cooler is a deploy-able implementation of what you propose -- and i certainly hope you don't limit your audience to just 1 platform. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re^4: Joint Database Technology by Anonymous Monk on Jun 12, 2017 at 17:55 UTC
For reporting from JOINT Flat File/SDBM databases, I personally would use Win32 Perl OLE module to talk to MS-Excel, sending my report outpit directly there to formatted reports and pivot tables.	[reply]
Re^4: Joint Database Technology by locked_user erichansen1836 (Sexton) on Oct 05, 2017 at 12:26 UTC
Jeffa, for pivot tables use MS-Excel. The Perl Win32::OLE module allows one to establish a Perl COM connection to MS-Excel in the background. I have used this for over a decade to send my SQL Result Sets from MS-Access database files to Excel files. I don't use the MS-Access front-end software, but only connect to MS-Access files via Dave Roth's Win32::ODBC Perl module. Remember, this did not stop Microsoft Corporation from producing MS-Access software for data storage. Likewise, I don't expect most people would object to FlatFile/SDBM databases not providing an all inclusive solution for reporting. If you are looking for a single product that does everything (such as MS-Access/MS-Excel combined into one software), good luck.