in reply to OT - SQL choosing a schema for index tables

Just some loose remarks:

- What dbms will you be using? SQL implementations aren't so uniform that it does not matter (and in such a small db interoperability is probably unimportant?).

- It seems unwise to name a table with 'index' or an '_index' suffix. An index belongs to a table but it is something separate (it isn't even part of the SQL specification; indexes are just an implementation detail that may or may not speed up a retrieval from a table).

- The data you describe are so small (hundreds of rows) that performance and indexes are unlikely to be important considerations (small tables are always seqscanned). Concentrate on getting easy-to-understand queries.

- Your option 2 (everything in one table) is called EAV and is generally frowned upon (see for instance this). I think eav is only defensible when a 'table structure' needs to be extremely flexible (and the structure is unknown beforehand), and this is not the case for you).

Maybe you should try to show us the SQL-'hairiness' that you fear. It probably isn't as bad as you think :)

  • Comment on Re: OT - SQL choosing a schema for index tables

Replies are listed 'Best First'.
Re^2: OT - SQL choosing a schema for index tables
by bangor (Monk) on Aug 28, 2015 at 13:15 UTC
    Thank you erix for your helpful reply. I am using MySQL.

    - It seems unwise to name a table with 'index' or an '_index' suffix.
    Noted - I will change

    - ...Concentrate on getting easy-to-understand queries.
    This is exactly what I am trying to do as I know some one else will be maintaining.

    - Your option 2 (everything in one table) is called EAV and is generally frowned upon.
    Thanks for the links, and yes Option 2 seems to be a bad idea.

    Maybe you should try to show us the SQL-'hairiness' that you fear. It probably isn't as bad as you think :)

    I don't have any hairiness at the moment, what I have is a Perl loop which queries each table in turn:

    my ($self,$params) = @_; my %tables = ( origin => 'tree_origin', category => 'tree_flowering', ripe => 'tree_ripe', usage => 'tree_usage', ); my $ids; for my $name (keys %tables) { my $value = $params->{$name} or next; $ids = $self->search_index_table( $tables{$name}, $value, $ids ); last unless $ids; }
    And the sub to query the database is something like this...
    sub search_index_table { my ( $self, $table, $value, $ids ) = @_; my $query = "SELECT id FROM $table WHERE name=?"; if ($ids) { my $id_string = join(',', @$ids ); $query .= " AND id IN ($id_string)"; } my @results; my $dbh = $self->{'dbh'}; my $sth = $dbh->prepare($query); $sth->execute($value); while ( my ($id) = $sth->fetchrow_array() ) { push(@results,$id); } return (@results ? \@results : undef); }
    This all works correctly, but I can't help thinking that all these calls to the database server (which is on another machine) is very inefficient. I suppose my real itch is that there are only a couple of hundred rows in the main table, and I have another 10 or so tables to index the multiple-valued fields - from my reading this seems the correct way to do things, but at the same time feels over-engineered for such a small dataset.

      It's a bit unusual but if it works well and performs well, I wouldn't worry about it (because the data is so small).

      But it would indeed (as already mentioned upthread) make more sense to have an SQL statement that JOINs the tables and that retrieves the rows that you want in one trip to the databaseserver. (But it's quite possible that you'd hardly notice a speed difference.)

Re^2: OT - SQL choosing a schema for index tables
by chacham (Prior) on Aug 28, 2015 at 16:50 UTC

    (everything in one table) is called EAV and is generally frowned upon

    It is only frowned upon when there is another way. Such as, when all attributes are static. If there's only one value per attribute, it would go in the main table, if there are many, they would each get a child table. It is not the structure that needs to be flexible (that would likely be a bad use of an EAV) but that the attributes are unknown. Such as a translation table, where the translated items are unknown, as are the languages it will be translated into.

    I got he impression that the attributes themselves are not static, which is why i suggested an EAV. If they are static, however, it would be frowned upon by the purists in this case.