Re (tilly) 1: Using hashes instead of arrays with database results

Three observations.

The first is that if you introduce a little latency between your database server and your client machine (eg a firewall between the two, some load-balancing software, etc), the effect of that can rapidly dwarf the overhead of hashes. (True story. A database loading script for Sybase I was working on slowed massively. The problem? The rows got too big for the packets, and passing data slowed massively.)

Do you know where your bottlenecks are? Immediately leaping to use arrays rather than hashes is an excellent example of premature optimization. Depending on your queries, the simple act of organizing your tables well and adding the right indexes can make a far larger difference than the act of accessing data. If it is a user visible request, how many hash accesses are you really going to do? How much string manipulation are you going to do right afterwards that is just as inefficient?

Which all comes back to the fundamental point that any kind of optimization doesn't matter unless you need it. It sounds silly when said that way, but it is true. Either you need to hit a performance mark, in which case you are really, really sad if you don't, or you don't have a problem in which case it didn't matter. There isn't much ground between that.

So yes. Arrays are faster than hashes. Wonderful. If I was going to work on a data warehouse, I might care. But I don't. So as long as I can get my measly 80,000 row tables calculated and loaded from scratch in a night, I don't care about efficiency. Hmm, I am nowhere close to having a problem now, how about later? Well in 5 years they will be 160,000 row tables, and hardware will be 10x as fast. I am failing to see a problem here.

Now if you are pushing the envelope, then disregard this. There are people who work with data warehouses and need to really understand performance. (Though competent ones seem to make hard decisions about what is penny wise and pound foolish pretty ruthlessly. A friend of mine told me how he used closures pretty ruthlessly as an abstraction technique in a loading script. The overhead of calling all of those functions took an extra $30K of hardware, the project got delivered 6 weeks sooner, it was worth it.)

Likewise high performance websites (which I thankfully have little performance with - nor do I want to) have their own needs and may care. But the average corporate intranet site gets how many hits per minute? Again, if it is an issue for you, then worry about it. It isn't for most of us though.

PS Please don't mistake my attitude for an implicit admission that I am unable to get performance when I want it. Ask jeffa about that. :-)

Comment on Re (tilly) 1: Using hashes instead of arrays with database results

Replies are listed 'Best First'.
Re: Re (tilly) 1: Using hashes instead of arrays with database results by mpeppler (Vicar) on Jan 31, 2002 at 17:05 UTC
True story. A database loading script for Sybase I was working on slowed massively. The problem? The rows got too big for the packets, and passing data slowed massively. I'm sure you know this - but you can configure the Sybase database server to handle larger packets (up to 8k, I think), and then the client apps can use this as well (I think DBD::Sybase can do this - if not I'll have to add it in!). It's a potentially huge performance increase for certain types of applications... Michael	[reply]
Re (tilly) 3: Using hashes instead of arrays with database results by tilly (Archbishop) on Jan 31, 2002 at 17:15 UTC
Thanks, I did know that. The setting in DBD::Sybase is already there, it is called packetSize. Doesn't help if your OS setting don't support it, or if you don't control the database...	[reply]