eric256 has asked for the wisdom of the Perl Monks concerning the following question:

I am working on a node based web management system where each node is comprised entirely of meta data (no set parameters are required per node. I have two tables, one to hold each nodes (id) to link meta data to it, and for inheritance reasons. Then I have the meta data table which holds (nodeid, meta data name, meta data). So for each node there are several nodes in the meta data table containing the name of the data "color" and the data itself. So the table might be something like

Nodes
id
1
2
3

Meta Data
NodeID   Name    Value
1        color   red
1        name    george
2        color   green
2        size    5
3        name    fred
3        title   test

Now my problem/question is how to query this data for mutiple meta attributes. Like how to find all nodes who's 'color' attribute = 'green'. Obviously standard queries wont work because the data is spread threwout multiple records.

I have only realy come up with one option, Loading the nodes data into a hash and then looping threw the hash finding all nodes that match the criteria...This is kind of a bad solution if you have very many nodes at all.

So i am hoping that somewhere here might have a brilliant idea about how to query the data, or even some way maybe using temperary tables.

Any and all help/thoughts/opinions are welcome.

___________
Eric Hodges

Replies are listed 'Best First'.
Re: Querying Meta Data
by dragonchild (Archbishop) on Jul 24, 2003 at 18:24 UTC
    The SQL that will do what you want is:
    SELECT n.id FROM nodes n, meta_data md WHERE md.name = "color" AND md.value = "green" AND n.id = md.node_id ORDER BY 1
    To make this useful, do something like
    my $sql = <<'__END__'; SELECT n.id FROM nodes n, meta_data md WHERE md.name = ? AND md.value = ? AND n.id = md.node_id ORDER BY 1 __END__ my $sth = $dbh->prepare_cached($sql) || die $!; $sth->execute($attr, $value) || die $!; # Use your favorite fetch*() method here $sth->finish;
    Placeholders are your friend.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      As a bonus, most SQL databases will let you join against the same table multiple times, allowing you to query for nodes with some combination of characteristics:
      SELECT n.* FROM nodes n, meta_data md1, meta_data md2 WHERE md1.name = ? AND md1.value = ? AND n.id = md1.node_id AND md2.name = ? AND md2.value = ? AND n.id = md2.node_id ORDER BY 1

      (Check the documentation for your database software to confirm that this is supported and identify any idiosyncratic syntax that you may have to comply with.)

      For what it's worth, I use a similar structure to store clinical records data for a hospital's psychiatry department, and after a bit of time spent examining MySQL's query plans and adjusting the indexes, have found the performance of this type of query to be acceptable, even with tens of thousands of entries.

        Thanks dragonchild, i did mean to mention that i had the sql for the first link and not the others.

        and Thanks simonm thats exactly what I had in mind but I had no clue that you could join a table more than once. Off i go looking thre MySQL documentation with a new point of view. How many joins do you do in your case? How many is too many? Just ideas, I know that it will vary from case to case, and depend on the size of the table etc.

        Here is the code i came up with so far. Its a bit rough, but i thought others might find it useful!

        use Data::Dumper; print GetNodes({name => "hello",type => "god", color => "greeen"}); sub GetNodes { my $meta = shift; #expects a hash of {name => value} print Dumper($meta); my $sql = "SELECT md1.node_id FROM "; my @From; # list of tables my @Where; # list of conditions my @LastWhere; my $i = 0; foreach my $name (keys %$meta) { $i ++; push @From, "`meta_data` as md$i"; push @Where,"md$i.name = \"" . $name . "\""; push @Where,"md$i.value = \"" . $meta->{$name} . "\""; push @LastWhere,"md$i.node_id"; } $sql .= join(",",@From) . " WHERE "; $sql .= join(" AND ", @Where); $sql .= " AND " .join(" = ",@LastWhere) . ";"; return $sql; }
        outputs
        SELECT md1.node_id FROM `meta_data` as md1,`meta_data` as md2,`meta_data` as md3 WHERE md1.name = "type" AND md1.value = "god" AND md2.name = "color" AND md2.value = "greeen" AND md3.name = "name" AND md3.value = "hello" AND md1.node_id = md2.node_id = md3.node_id;
        Thanks!
        Update: Added above code :-)
        Eric Hodges
Re: Querying Meta Data
by valdez (Monsignor) on Jul 24, 2003 at 20:55 UTC

    I use this SQL query:

    select n.id from nodes n, metadata m where n.id=m.id and ((m.name='color' and m.value='red') or (m.name='name' and m.value='george')) group by n.id having count(n.id)=2;
    Of course, the query can be built dinamically and should be adapted to your needs. I'm not an SQL expert, so I welcome any opinion on this approach.

    Ciao, Valerio

      Thats a nice twist on it. Any ideas on weather that is better or worse speed wise?
      ___________
      Eric Hodges
        The relative speed of the self-join vs HAVING option is really very difficult to assess - it really depends on the data, the way the indexes are set up, etc.

        In addition I think that HAVING is a Sybase/MS-SQL extension, so it may not be available with MySQL.

        Michael