in reply to Re^5: Creating SELECT-like look-up tables as hashes
in thread Creating SELECT-like look-up tables as hashes

Thanks for the offer of extra eyes - I am in fact somewhat databatially challenged.

The goal is to take a list of real events and compare this with a list of potential events in order to do a statistical analysis of the probability of events occurring.

The two tables are created like this:

CREATE TABLE Real_Events ( Id INT, Name1 VARCHAR, Name2 VARCHAR, Date INT, Group VARCHAR, ) CREATE TABLE Potential_Events ( Name1 VARCHAR, Name2 VARCHAR, Group VARCHAR, )

The SELECTs I need are, for a given event in Real_Events:

I currently read the entire table of actual events into an array of hashrefs. This allows me to define a look-up table on, say, 'Group':

push (@{$lookup_group{$_->{"Group"}}}, $_) for @rows;

I can use this to get the rows for a given group and then loop over these to check the date. A similar approach can be used to create a look-up table with "$Name1$Name2$Group" as the key, which can be used to check whether a potential event has already occured

I hope that gives a rough idea of what I'm trying to do. The program is orginally from a user whom I am trying to help get his data crunched before a deadline. So optimisation time plus run time has to comply with this restriction. Apart from that, unfortunately I also have other work to do, so the time I have to work on this also limited, but I would be greatful for any pointers to low-hanging fruit.

Thanks,

loris

Replies are listed 'Best First'.
Re^7: Creating SELECT-like look-up tables as hashes
by BrowserUk (Patriarch) on Dec 05, 2013 at 09:46 UTC

    Are you using any indexes on those tables?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Yes, on Id for Real_Events and on Name1, Name2, Group for Potential_Events.

      loris

        My best DB days are long behind me (DB2), and I've done very little with sqlite so take this with a big pinch of salt, but seems to me that given your queries, an index in (Real_?)Events.Id is doing you no good at all.

        I'd be tempted to try adding an index on (at least) the date field. (And possibly dropping the one on Id.)

        As far as your HoAs is concerned, it might be worth considering sorting the arrays by data and using a binary search.

        I suspect that this thread has gotton too deep to be getting the eyeballs you really need to get good suggestions for this problem. If you can find the time, you might try consolidating the information you've given in this sub-thread and making a post to the sqlite mailing list asking for the best way to optimise those queries.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.