in reply to Re^5: Documentation of REGEXP support in DBD::SQLite?
in thread Documentation of REGEXP support in DBD::SQLite?

Question: how are substrings of size 1 or 2 handled? Are they just ignored?
explain analyze select * from azjunk7n where txt ~ 'ba'; -- '~' means: consider regex +index QUERY PLAN + ---------------------------------------------------------------------- +----------------------------------------------- Seq Scan on azjunk7n (cost=0.00..267879.16 rows=707066 width=85) (ac +tual time=5.413..9163.252 rows=897633 loops=1) Filter: (txt ~ 'ba'::text) Rows Removed by Filter: 9102367 Planning Time: 0.360 ms Execution Time: 9189.173 ms (5 rows) Time: 9190.029 ms (00:09.190)

Nine seconds. Because, of course, if there are too many hits (here: 897633), the system switches to SeqScan - after all, a sequential scan is the fastest way to access many rows. Faster would've been: where position('ba' in txt) > 0 which would SeqScan in 3 seconds; but position() doesn't allow regexen.

Replies are listed 'Best First'.
Re^7: Documentation of REGEXP support in DBD::SQLite?
by LanX (Saint) on Nov 16, 2024 at 07:52 UTC
    Thanks :)

    > Because, of course, if there are too many hits (here: 897633), the system switches to SeqScan

    This doesn't decisively answer the question if trigrams are indexed by bigrams... 🤔

    What if you search a bigram which is - by design- only in few rows?

    Or an and combination of multiple bigrams?

    And to be sure, please use the like '%ab%' form again.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      (For decisive answers, read the postgres code ;))

      And to be sure, please use like '%ab%' again.

      I can't get bigrams to respond quickly; also not when there is only one matching value; with this data it will always Seq scan (sometimes with parallel workers: just under 1 second).

      In postgres, 'LIKE' doesn't allow regex (although its simple pattern search can sometimes use the trigram or btree index). Postgres uses the tilde for regex search (~ case sensitive, ~* case insensitive).

        > I can't get bigrams to respond quickly

        Hence they are not indexed.

        > although its simple pattern search can sometimes use the trigram or btree index

        That's why I wanted LIKE, to be sure the trigram optimization can be chosen

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery