in reply to Re^4: Documentation of REGEXP support in DBD::SQLite?
in thread Documentation of REGEXP support in DBD::SQLite?

Question: how are substrings of size 1 or 2 handled?

Are they just ignored?

Or are trigrams-indeces also indexed, such that "ab" is efficiently found in "cab" and "abs" ?

Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^6: Documentation of REGEXP support in DBD::SQLite?
by erix (Prior) on Nov 16, 2024 at 03:44 UTC

    Question: how are substrings of size 1 or 2 handled? Are they just ignored?
    explain analyze select * from azjunk7n where txt ~ 'ba'; -- '~' means: consider regex +index QUERY PLAN + ---------------------------------------------------------------------- +----------------------------------------------- Seq Scan on azjunk7n (cost=0.00..267879.16 rows=707066 width=85) (ac +tual time=5.413..9163.252 rows=897633 loops=1) Filter: (txt ~ 'ba'::text) Rows Removed by Filter: 9102367 Planning Time: 0.360 ms Execution Time: 9189.173 ms (5 rows) Time: 9190.029 ms (00:09.190)

    Nine seconds. Because, of course, if there are too many hits (here: 897633), the system switches to SeqScan - after all, a sequential scan is the fastest way to access many rows. Faster would've been: where position('ba' in txt) > 0 which would SeqScan in 3 seconds; but position() doesn't allow regexen.

      Thanks :)

      > Because, of course, if there are too many hits (here: 897633), the system switches to SeqScan

      This doesn't decisively answer the question if trigrams are indexed by bigrams... 🤔

      What if you search a bigram which is - by design- only in few rows?

      Or an and combination of multiple bigrams?

      And to be sure, please use the like '%ab%' form again.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

        (For decisive answers, read the postgres code ;))

        And to be sure, please use like '%ab%' again.

        I can't get bigrams to respond quickly; also not when there is only one matching value; with this data it will always Seq scan (sometimes with parallel workers: just under 1 second).

        In postgres, 'LIKE' doesn't allow regex (although its simple pattern search can sometimes use the trigram or btree index). Postgres uses the tilde for regex search (~ case sensitive, ~* case insensitive).