Re^6: Documentation of REGEXP support in DBD::SQLite?

Question: how are substrings of size 1 or 2 handled? Are they just ignored?


explain analyze
select * from azjunk7n where txt ~ 'ba'; -- '~' means: consider regex 
+index

                                                     QUERY PLAN       
+                                               
----------------------------------------------------------------------
+-----------------------------------------------
 Seq Scan on azjunk7n  (cost=0.00..267879.16 rows=707066 width=85) (ac
+tual time=5.413..9163.252 rows=897633 loops=1)
   Filter: (txt ~ 'ba'::text)
   Rows Removed by Filter: 9102367
 Planning Time: 0.360 ms
 Execution Time: 9189.173 ms
(5 rows)

Time: 9190.029 ms (00:09.190)
[download]

Nine seconds. Because, of course, if there are too many hits (here: 897633), the system switches to SeqScan - after all, a sequential scan is the fastest way to access many rows. Faster would've been: where position('ba' in txt) > 0 which would SeqScan in 3 seconds; but position() doesn't allow regexen.

Comment on Re^6: Documentation of REGEXP support in DBD::SQLite? Select or Download Code

Replies are listed 'Best First'.
Re^7: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 16, 2024 at 07:52 UTC
Thanks :) > Because, of course, if there are too many hits (here: 897633), the system switches to SeqScan This doesn't decisively answer the question if trigrams are indexed by bigrams... 🤔 What if you search a bigram which is - by design- only in few rows? Or an `and` combination of multiple bigrams? And to be sure, please use the `like '%ab%'` form again. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^8: Documentation of REGEXP support in DBD::SQLite? by erix (Prior) on Nov 16, 2024 at 08:31 UTC
(For decisive answers, read the postgres code ;)) And to be sure, please use like '%ab%' again. I can't get bigrams to respond quickly; also not when there is only one matching value; with this data it will always Seq scan (sometimes with parallel workers: just under 1 second). In postgres, 'LIKE' doesn't allow regex (although its simple pattern search can sometimes use the trigram or btree index). Postgres uses the tilde for regex search (~ case sensitive, ~* case insensitive).	[reply]
Re^9: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 16, 2024 at 08:35 UTC
> I can't get bigrams to respond quickly Hence they are not indexed. > although its simple pattern search can sometimes use the trigram or btree index That's why I wanted LIKE, to be sure the trigram optimization can be chosen Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]