in reply to Faster of two options for finding keywords in a database

Assuming what you really want to do is find the most relevant record for a search, neither option is going to give you good performance on a large database. The "right tool for the job" is either the Postgres fulltext search feature, or using a second database like ElasticSearch.

The built-in postgres keyword search isn't as good as ElasticSearch, but it's right at your fingertips when your database is already postgres. It works like this:

  1. Declare a column for your keywords, and write some clever code to fill it with all the keyword combinations that matter, including duplicates for the important ones.
  2. Put a fulltext index on that column
  3. Sort your postgres query by the ts_rank

Here's an example of the DDL:

CREATE TABLE sometable ( ... keyword_search text not null ); CREATE INDEX sometable_fulltext ON sometable USING GIN (to_tsvector('english', keyword_search));

obviously change 'english' if your users have a different primary language.

If you have multiple columns to search, you can make a generated column that combines them into the keywords:

keyword_search text not null GENERATED ALWAYS AS ( name || ' ' || COALESCE(description, '') || ' ' || COALESCE(tags, '')) STORED,
You select from this table like
SELECT * FROM sometable WHERE to_tsvecor('english',keyword_search) @@ plainto_tsquery('english +', ?) ORSER BY ts_rank(to_tsvecor('english',keyword_search), plainto_tsquery +('english', ?))
And then substitute your search for both '?' in that query.

If you're interested in integrating this with DBIx::Class, I can show some neat code for that.

The ElasticSearch solution is more involved. You would set up a separate ElasticSsearch database and then every time your application modifies content you also push a summary of that content into the elasticsearch database, then query that database instead of postgres to get your list of entity IDs to show to the user, then query postgres for the entities using the list of IDs, then render them for the user.

Replies are listed 'Best First'.
Re^2: Faster of two options for finding keywords in a database
by Danny (Chaplain) on Feb 18, 2024 at 02:46 UTC
    Or possibly more simply just keep a database lookup table with columns:

    keyword || id_reference_to_text_column

    then query that table with your keywords like "select id_reference_to_text_column from lookup_table where keyword in (key1, key2, ...)" or however you do that in postgres. Then you could simply sum up the occurrences of each id_reference_to_text_column.

    Or faster with group by (mysql syntax)

    create table tmp (kwd varchar(255), idx integer, primary key (kwd, idx))

    populate table ... then query it like

    select idx, count(*) from tmp where kwd in ("heugh", "stook", "tanti") + group by idx
      This is at least cross-platform, but it's also sort of what the postgres fulltext feature does. The postgres fulltext should still run faster though, and I think it uses much more compact storage than creating one row per keyword per original object with a b-tree on the whole thing.

      Also debatable which is "simpler", because with a table of keyword-to-record-id you have to maintain that yourself. With the one I showed, once you get the table and query set up, postgres does all the work to maintain the index.