Re: Documentation of REGEXP support in DBD::SQLite?

See REGEXP-function of DBD::SQLite, emphasis added.

REGEXP function
SQLite includes syntactic support for an infix operator REGEXP , but without any implementation.
The DBD::SQLite driver automatically registers an implementation that performs standard perl regular expression matching , using current locale.
So for example you can search for words starting with an A with a query like
SELECT * from table WHERE column REGEXP '\bA\w+'
If you want case-insensitive searching, use perl regex flags, like this :
SELECT * from table WHERE column REGEXP '(?i:\bA\w+)'
The default REGEXP implementation can be overridden through the create_function API described above.
Note that regexp matching will not use SQLite indices, but will iterate over all rows, so it could be quite costly in terms of performance.

Update

Short version: Perl regexes are hooked into sqlite using a callback mechanism. But filtering row-wise. That's much slower than a native implementation.

Cheers Rolf
_{(addicted to the Perl Programming Language :)

see Wikisyntax for the Monastery}

Comment on Re: Documentation of REGEXP support in DBD::SQLite? Select or Download Code

Replies are listed 'Best First'.
Re^2: Documentation of REGEXP support in DBD::SQLite? by ibm1620 (Chaplain) on Nov 14, 2024 at 20:16 UTC
Thank you for finding that for me! Perl regexes are hooked into sqlite using a callback mechanism. But filtering row-wise. That's much slower than a native implementation. I'm not quite clear on what you mean by "native implementation" - would linking in libpcre2 (or another) as a custom function be expected to perform better? Or would I be better off using a different DBMS with built-in REGEXP support such as mySQL? (It does appear that both mySQL and SQLite REGEXP can't use the index and have to search the data itself.)	[reply]
Re^3: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 14, 2024 at 20:59 UTC
I don't know if they perform better. You have to test it. Native means for me without call overhead and efficiently compiled (subset of) regexes. Regarding indexes, I can only imagine a small subset of regexes capable to profit from them, unless a lot of special case optimisation was implementedš. It should be quite complicated to achieve this with a pluggable extension... But again I don't know ... This you should better ask at a DBM-board. Cheers Rolf _{(addicted to the Perl Programming Language :) see Wikisyntax for the Monastery} š) I'm not even sure a substring search with `LIKE %substr%` can take advantage from the index, the differences in your benchmark are not in magnitudes, this could be easily explained with "call overhead and efficiently compiled code". A real "index search" should be dramatically faster than just factor 4. And, as a side note, your regexes were much more complicated than a substr search. Apples and oranges...	[reply] [d/l]
Re^4: Documentation of REGEXP support in DBD::SQLite? by cavac (Prior) on Nov 20, 2024 at 14:15 UTC
Regular expressions can be implemented in databases with quite a bit of success. I use them for some stuff in PostgreSQL and it is blazingly fast. It shouldn't matter for complicated the regular expressions are, the simple fact that the database software only has to read a (relatively) small index file from disk instead of everything in the table (full table scan) should still speed up the search. And many databases also keep frequently used indexes in RAM. Generally, i think of SQLite more of a desktop-type database that is integrated into a single application with a relatively small amount of data. For serious data crunching, i always choose a serious standalone database like PostgreSQL (hey, it's the same price as SQLite). PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP Also check out my sisters artwork and my weekly webcomics	[reply]
Re^5: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 20, 2024 at 15:05 UTC
Re^6: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 21, 2024 at 14:34 UTC
Re^3: Documentation of REGEXP support in DBD::SQLite? by Danny (Chaplain) on Nov 14, 2024 at 22:32 UTC
mysql or something with builtin regexps would surely be at least somewhat faster. Btw, leading wildcards like `LIKE '%batman%'` don't use indices either, however, even with SQLite you can probably speed things up by first doing a rough pass `LIKE %batman%` then refining that with a regexp either as part of the statement like `LIKE '%batman%' AND REGEXP '(?i)\bbatman\b'` or just parsing the output of `LIKE '%batman%'` yourself.	[reply] [d/l] [select]
Re^4: Documentation of REGEXP support in DBD::SQLite? by erix (Prior) on Nov 15, 2024 at 06:11 UTC
SCNR - have to show off postgres' regex indexing on a quickly generated 10M row table. -- table size: 1117 MB -- regex index size: 1727 MB (big index = disadvantage) where txt like '%batman%'; --> 'like' means: search without regex; +uses the trgm index txt + ---------------------------------------------------------------------- +----------- zdfrcj batman igihbky bpwz hzzfmq plyazufcawovvjrlwow xkllfiovgktraw + bmnuyjelj Time: 9.911 ms where txt ~ '\ybatman\y'; --> '~' means: regex search; \y: word b +oundary txt + ---------------------------------------------------------------------- +----------- zdfrcj batman igihbky bpwz hzzfmq plyazufcawovvjrlwow xkllfiovgktraw + bmnuyjelj Time: 13.179 ms [download] (performance of these regex-indexes (via trigram matching from the pg_trgm extension) is good but the size can be reason not to use them.)	[reply] [d/l]
Re^5: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 15, 2024 at 09:59 UTC
Re^5: Documentation of REGEXP support in DBD::SQLite? by LanX (Saint) on Nov 15, 2024 at 10:44 UTC
Re^6: Documentation of REGEXP support in DBD::SQLite? by erix (Prior) on Nov 16, 2024 at 03:44 UTC
Some notes below your chosen depth have not been shown here