Re^3: Documentation of REGEXP support in DBD::SQLite?

Replies are listed 'Best First'.
Re^4: Documentation of REGEXP support in DBD::SQLite? by erix (Prior) on Nov 15, 2024 at 06:11 UTC
SCNR - have to show off postgres' regex indexing on a quickly generated 10M row table. -- table size: 1117 MB -- regex index size: 1727 MB (big index = disadvantage) where txt like '%batman%'; --> 'like' means: search without regex; +uses the trgm index txt + ---------------------------------------------------------------------- +----------- zdfrcj batman igihbky bpwz hzzfmq plyazufcawovvjrlwow xkllfiovgktraw + bmnuyjelj Time: 9.911 ms where txt ~ '\ybatman\y'; --> '~' means: regex search; \y: word b +oundary txt + ---------------------------------------------------------------------- +----------- zdfrcj batman igihbky bpwz hzzfmq plyazufcawovvjrlwow xkllfiovgktraw + bmnuyjelj Time: 13.179 ms [download] (performance of these regex-indexes (via trigram matching from the pg_trgm extension) is good but the size can be reason not to use them.)	[reply] [d/l]
Re^5: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 15, 2024 at 09:59 UTC
Thanks for this, I had something like trigrams in mind. But for full automatic regex support one would need to analyse the regex and identify the substrings which are in all execution paths ... Quite a task. Though... Perl does some optimisation when compiling its regexes and it's possible to introspect the result. It might be possible to build on this. On a side note: true, the index adds 150% to the datas size. But the speed gain is dramatic enough to justify it in many use cases. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply]
Re^5: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 15, 2024 at 10:44 UTC
Question: how are substrings of size 1 or 2 handled? Are they just ignored? Or are trigrams-indeces also indexed, such that `"ab"` is efficiently found in `"cab"` and `"abs"` ? Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^6: Documentation of REGEXP support in DBD::SQLite? by erix (Prior) on Nov 16, 2024 at 03:44 UTC
Question: how are substrings of size 1 or 2 handled? Are they just ignored? `explain analyze select * from azjunk7n where txt ~ 'ba'; -- '~' means: consider regex +index QUERY PLAN + ---------------------------------------------------------------------- +----------------------------------------------- Seq Scan on azjunk7n (cost=0.00..267879.16 rows=707066 width=85) (ac +tual time=5.413..9163.252 rows=897633 loops=1) Filter: (txt ~ 'ba'::text) Rows Removed by Filter: 9102367 Planning Time: 0.360 ms Execution Time: 9189.173 ms (5 rows) Time: 9190.029 ms (00:09.190)` [download] Nine seconds. Because, of course, if there are too many hits (here: 897633), the system switches to SeqScan - after all, a sequential scan is the fastest way to access many rows. Faster would've been: `where position('ba' in txt) > 0` which would SeqScan in 3 seconds; but position() doesn't allow regexen.	[reply] [d/l] [select]
Re^7: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 16, 2024 at 07:52 UTC
Re^8: Documentation of REGEXP support in DBD::SQLite? by erix (Prior) on Nov 16, 2024 at 08:31 UTC
Some notes below your chosen depth have not been shown here