baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:

hi,

i'm not sure if i'm on the right place too ask this question , but from some previous posts i see people are explaining their reasons for doing something through complexity.

so my question is the following:

let say i have one big table in ascii format(csv) and the 5 smaller ones. what i wish to do is search the big table with each and every small one. so when i import all those tables into sql db and make i query that takes each and every small data set and selects those data from the big table, in my understanding what i'm actually doing is :

N = 5 - number of small tables H = 100 rows -size of the small tables K = 10000 rows - size of the big table for(1..5){ # of those smaller ones for(1..100){ # number of steps i do when iterating through the smal + table for(1..10000){ # number of steps i do when iterating through big t +able //select a row } } } so \theta(NHK) is the complexity for that algorithm(?)
but if i hash all the small tables in a way:
value table ----- ----- %hash = (str1 => 1, str2 => 1, ... str100 => 1, str101 => 2, str102 => 2, ...);
my key is the value from the the small tables and its values are pointing to from which table did i take the value. so what can i do now is to while loop through the big table and for every row evaluate directly to which group does my data belongs to. example: if a hit the row that holds the str101 i now that that data belongs to the second small table.

so my question now is: is this just a faster implementation (because it runs much more faster than with db engine) or did i change the complexity to

for (1..5){ # hash tha data from smalo tables } while(1..10000){ # crosscheck with hash } so what i got is \theta(NH+K)
so the problem is how to treat hashed data.

thank you

Replies are listed 'Best First'.
Re: how do hashes work - complexity question
by zwon (Abbot) on Jul 11, 2009 at 08:27 UTC
    for(1..5){ # of those smaller ones for(1..100){ # number of steps i do when iterating through the smal + table for(1..10000){ # number of steps i do when iterating through big t +able //select a row } } }

    If so you're not using your db very efficiently. But it's hard to say as you didn't show us you query. Generally if you are able to improve performance using hash, then you can improve performance of SQL query using indexes.

      Seconded. But look at the OPs code. If the //select a row part actually fires a database query it's more than reasonable, that 10,000 iterations are somewhat faster than 5000000 =).


      holli

      You can lead your users to water, but alas, you cannot drown them.

        Yeah, I did actually mean that he can rewrite his DB code so it would look like:

        for (1..5) { # select content of table $_ into temp table } for (1..10000) { # crosscheck with temp table }
        and it's possible (it depends on exact requirements) that the later loop may be replaced with a single DB query.
      well, i now that with indexing the db data i can speed up my work. but then :

      1.i have to index the big table that is already indexed (the value itself is an index), and that is an extra number of steps

      2. i have to import my data (big table) - that is an extra number of steps,

      3.i have to index more that one column to speed up the search - that is a redundant number of steps.

      so the point is why to do all that if i can do it only by hashing(importing) smaller data and then just sweep once through the big data(file).

      plus what i notice is, that when db engine is importing staff, it is formatting it in some manner and that has to be expensive (extra number of steps) with respect to sweeping through already formatted file (format is not the issue, switching the formats is ).

      so the question was, when calculating theoretical complexity of hashed data, can i neglect the fact that, for every checkup, it has to go through all hash keys to get me the value for my checkup (is it, should i put it 'legite', to do that). because that is what i'm actually doing when evaluating the complexity of my db query, isn't it?

      thank you for your fast reply!

        so the question was, when calculating theoretical complexity of hashed data, can i neglect the fact that, for every checkup, it has to go through all hash keys to get me the value for my checkup

        There's no need to go through all hash keys, you're directly accessing element you're need (it's practically O(1) operation), so yes you're calculations are correct. Note, that the same is true for accessing DB record using index

        Isn't it easier/shorter to write SQL query than to write complicated hash perl?