I think ideally you will model your (composite) key as separate columns. Then you can query either all key columns or just a subset of them.
If you want to keep things simple, keep the optional key-value pairs at the end as a single string. If you want to also query them, an approach a slight step better is to format and store them as JSON. Then you can query them in the database almost as if they were additional columns. The ideal way is to convert these optional things either into a fixed set of additional columns or add another table that consists of three columns, (row-key, keyname, value). But doing that makes the queries somewhat more ugly.
| [reply] |
As long as there were no security concerns, another table could store sparse records from a number of the databases.
Providing a primary key of the filename, along with the three columns Corion already suggested. The records db would then hold a sparse column with a boolean value denoting the record is held on that db. Checking the boolean value and retrieving from the other table when requiring the sparse values.
Another approach that has just occured to me, of the ugly variety, with additional overhead, may be to hash the filenames before entering them onto the database. The first row primary key being the filename itself hashed, with the secondary row being the filename concatenated with for example, the term 'sparse' before being hashed. Already I can feel the glares.
The issues with this would be losing the key information, you would necessarily need to know beforehand the key values of sparse data. Using an additional table to denote the keys could be a solution. At this stage it would then be a matter of performance requirements, size of records and whether there is any advantage of having relatively few empty record rows with additional tables denoting keys for each db, along with the requirement to hash/dehash, versus additional mostly empty column(s).
Likely the better solution is to serialise in such a format as JSON to keep the db contained. But at this point would you not just store the whole optional hash record as a JSON blob anyway, meaning there would only need be the one additional column.
On reflection, this is a partial conversion of the binary data, but for this kind of mixed data there is more than one step needed.
| [reply] |
I second SQlite. I use it a lot, and if your dataset is big(ish), you don't want to be parsing flat files for every lookup.
How you store this depends on just how many [ key1=value1, key2=value2] there are.
If there is a fixed list of 'key1','key2' and it's not too many: bite the bullet and add the columns.
Otherwise, you get to use a link (This is Database 101. search for 'database normalization')
each record should be
[id], field1, field2
then in a separate table
[link_id], key1, value1
[link_id], key2, value2
You can also have a 3rd table with
[link_id], source_file_name
or whatever other metadata you need to keep.
SQLite very kindly has a magic ROWID column. It wont' return it on SELECT *, but will on SELECT ROWID,*.
Of course you get to do two queries for each lookup: SELECT ROWID,* FROM main_data and then SELECT * FROM extra_data WHERE link_id = ?
HTH.
TO clarify: if your 'real' data is
name=bob, age=75, [ hair=balding, glasses=bifocal]
name=john, age=20, [ sport=chess ]
Then you get:
Primary table:
1, bob, 45
2, john, 20
lookup table:
1,hair,balding
1,glasses,bifocal
2,sport,chess
| [reply] [d/l] [select] |
| [reply] |
Searchability.
using a linked table, you can do
SELECT
*
FROM
main_data
WHERE
ROWID IN (
SELECT link_id
FROM extra_data
WHERE key1 = ?
AND value1 LIKE ?
)
Consider also:
do you want to put a storage method (JSON) inside a different storage method (SQL)?
| [reply] [d/l] |