Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Musings about a database ecology

by iburrell (Chaplain)
on Dec 20, 2004 at 21:30 UTC ( [id://416329] : note . print w/replies, xml ) Need Help??

in reply to Musings about a database ecology

First, as other people have said, if your database is slowing with 9000 documents, something is broken with your schema, your database server, or your code. Look at your database access code for complicated or too many queries. Look at your schema for adding indexes.

Second, you shouldn't need to worry about locking when accessing the database. And if you do need locking, then use the database locks. The whole point of using a transactional database is to hide much of the locking and still get transactions and atomic updates. Which database server are you using?

Replies are listed 'Best First'.
Re^2: Musings about a database ecology
by talexb (Chancellor) on Dec 20, 2004 at 21:57 UTC

    I'm using PostgreSQL, and it was only after I took all the locks out six months ago that we started getting reasonable throughput .. now it seems to have bogged down again in a similar way.

    I've installed Devel::Profile but it's having some strange complication with Log::Log4Perl that I don't quite understand, and dumping out of the processing loop instead of hanging around and sending off its "Hey!" messages at regular intervals.

    Alex / talexb / Toronto

    "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

      You *are* using the vaccuum command and the analyze commands, right? Just wondering, because I felt I knew PostgreSQL after using it for years, and then realized I wasn't using these things correctly. In either case, it sounds like you are not 100% sure yet about where the bottleneck is, so you should probably check out the SQL queries that you are using, etc. did "Explain" give you any info? Are your most-often used queries using the correct indices? Would you benefit if you create a PL/PgSQL function?

      As others have hinted at, I'd suggest some time in a PostgreSQL forum and optimize the database.

      Note, this may include changes to your code so it presents the queries in a manner PostgreSQL likes. I don't know anything about PostgreSQL so I can't tell you if bind_param and the other things like that (I assume you are using DBI?) have any affect. In Oracle land, you want to make sure you don't cause the database to have to re-parse the query with every hit so you use place holders in the query text and use bind_param (I think that is the method call) to set the proper variables. Each database has different parts of the query that can be set in this way.

      Other issues would also be database specific such as indexing, type of indexes, computing statistics for cost based optimization vs rule based optimzation that would require you to rearrange your where clause, etc. Also look to PostgreSQL specific features that may help. Oracle has a built in queue that may be more what you want for dealing with events. Others mentioned triggers that may or may not be able to call out to the operating system. You may also be able to (with the proper tools) profile your SQL directly in PostgreSQL to make sure your queries are not particularly heinous.

      Some perlish things to consider (but I'd do all the above first) would be to make sure you are not wasting time disconnecting and reconnecting to the database if these things are hitting the database every second or so (Oracle takes up to three seconds to connect and disconnect). If you have a lot of these, some sort of connection pooling may be in order, but again, that depends on how these are being initiated and what PostgreSQL likes and doesn't.

      Generally speaking, home grown locking mechanisms will just hurt you in ways you won't enjoy...

      I had a closer look at the queries that I've been doing, and it turns out that the query is searching through a 180,000 row table every twenty seconds or so. it's taking 12 seconds to do that .. so now I have a much better idea of where to optimize, and a temporary table holdnig 'jobs to be done' is looking a lot better.

      Thanks for your feedback.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

        Moral: when you have a performance problem and a database is involved, start by looking very hard at the database and what it is doing to provide you with the results. An RDBMSs query optimizer is a complex piece of software, and it can make wrong decisions that generate horrendously slow query plans...


        Depending on the search, you'll be fine just putting an index on whatever column it is searching on. Then it just does a lookup on the index and the search is lightning fast.