in reply to Re: Musings about a database ecology
in thread Musings about a database ecology

I'm using PostgreSQL, and it was only after I took all the locks out six months ago that we started getting reasonable throughput .. now it seems to have bogged down again in a similar way.

I've installed Devel::Profile but it's having some strange complication with Log::Log4Perl that I don't quite understand, and dumping out of the processing loop instead of hanging around and sending off its "Hey!" messages at regular intervals.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re^3: Musings about a database ecology
by lestrrat (Deacon) on Dec 20, 2004 at 22:38 UTC

    You *are* using the vaccuum command and the analyze commands, right? Just wondering, because I felt I knew PostgreSQL after using it for years, and then realized I wasn't using these things correctly. In either case, it sounds like you are not 100% sure yet about where the bottleneck is, so you should probably check out the SQL queries that you are using, etc. did "Explain" give you any info? Are your most-often used queries using the correct indices? Would you benefit if you create a PL/PgSQL function?

Re^3: Musings about a database ecology
by KeighleHawk (Scribe) on Dec 20, 2004 at 22:46 UTC
    As others have hinted at, I'd suggest some time in a PostgreSQL forum and optimize the database.

    Note, this may include changes to your code so it presents the queries in a manner PostgreSQL likes. I don't know anything about PostgreSQL so I can't tell you if bind_param and the other things like that (I assume you are using DBI?) have any affect. In Oracle land, you want to make sure you don't cause the database to have to re-parse the query with every hit so you use place holders in the query text and use bind_param (I think that is the method call) to set the proper variables. Each database has different parts of the query that can be set in this way.

    Other issues would also be database specific such as indexing, type of indexes, computing statistics for cost based optimization vs rule based optimzation that would require you to rearrange your where clause, etc. Also look to PostgreSQL specific features that may help. Oracle has a built in queue that may be more what you want for dealing with events. Others mentioned triggers that may or may not be able to call out to the operating system. You may also be able to (with the proper tools) profile your SQL directly in PostgreSQL to make sure your queries are not particularly heinous.

    Some perlish things to consider (but I'd do all the above first) would be to make sure you are not wasting time disconnecting and reconnecting to the database if these things are hitting the database every second or so (Oracle takes up to three seconds to connect and disconnect). If you have a lot of these, some sort of connection pooling may be in order, but again, that depends on how these are being initiated and what PostgreSQL likes and doesn't.

    Generally speaking, home grown locking mechanisms will just hurt you in ways you won't enjoy...

Re^3: Musings about a database ecology
by talexb (Chancellor) on Dec 21, 2004 at 02:06 UTC

    I had a closer look at the queries that I've been doing, and it turns out that the query is searching through a 180,000 row table every twenty seconds or so. it's taking 12 seconds to do that .. so now I have a much better idea of where to optimize, and a temporary table holdnig 'jobs to be done' is looking a lot better.

    Thanks for your feedback.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      Moral: when you have a performance problem and a database is involved, start by looking very hard at the database and what it is doing to provide you with the results. An RDBMSs query optimizer is a complex piece of software, and it can make wrong decisions that generate horrendously slow query plans...

      Michael

      Depending on the search, you'll be fine just putting an index on whatever column it is searching on. Then it just does a lookup on the index and the search is lightning fast.

        This is one of the big myths about database optimizations. Indexes are not always good, and full table scans are not always bad. I have just experienced this at work. We were able to make 300+% gains on our queries by forcing oracle to do a full table scan over a 12 million row table.

        Ultimately, you have to know what your query is doing before you start optimizing it. With Postgres, start using the EXPLAIN syntax to figure out if you're using indexes or full table scans. (ie EXPLAIN select * from foo where baz = 'baz'). Next, it helps to understand what your query is doing, and perhaps create a different index that is more selective.

        12 secs to scan a 180k row table sounds awfully slow, but it depends on hardware and your I/O subsystem.


        --
        Ben
        "Naked I came from my mother's womb, and naked I will depart."