in reply to Re^2: Documentation of REGEXP support in DBD::SQLite?
in thread Documentation of REGEXP support in DBD::SQLite?

I don't know if they perform better. You have to test it.

Native means for me without call overhead and efficiently compiled (subset of) regexes.

Regarding indexes, I can only imagine a small subset of regexes capable to profit from them, unless a lot of special case optimisation was implemented¹.

It should be quite complicated to achieve this with a pluggable extension...

But again I don't know ... This you should better ask at a DBM-board.

Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery

¹) I'm not even sure a substring search with LIKE %substr% can take advantage from the index, the differences in your benchmark are not in magnitudes, this could be easily explained with "call overhead and efficiently compiled code".

A real "index search" should be dramatically faster than just factor 4.

And, as a side note, your regexes were much more complicated than a substr search. Apples and oranges...

Replies are listed 'Best First'.
Re^4: Documentation of REGEXP support in DBD::SQLite?
by cavac (Prior) on Nov 20, 2024 at 14:15 UTC

    Regular expressions can be implemented in databases with quite a bit of success. I use them for some stuff in PostgreSQL and it is blazingly fast. It shouldn't matter for complicated the regular expressions are, the simple fact that the database software only has to read a (relatively) small index file from disk instead of everything in the table (full table scan) should still speed up the search. And many databases also keep frequently used indexes in RAM.

    Generally, i think of SQLite more of a desktop-type database that is integrated into a single application with a relatively small amount of data. For serious data crunching, i always choose a serious standalone database like PostgreSQL (hey, it's the same price as SQLite).

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
    Also check out my sisters artwork and my weekly webcomics
      > It shouldn't matter for complicated the regular expressions are,

      Sorry that's bordering at nonsense hubris.

      To take advantage of the index you need to identify character sequences in the regex.° (And erix even showed that it only works for 3+ characters in PG.)

      Now, while it's possible to identify those sequences at the compile phase - at least in Perl using re - one need's to know which ones are

      • mandatory
      • or optional
      • or belonging to more complicated AND/OR clauses.

      And regexes are Turing complete.

      It might be possible with sufficient work involved, but I doubt it's already done.

      You are more than welcome to prove me wrong. Even with a test/benchmark in PG.

      Claiming that the complexity doesn't matter is really quite a bold statement.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      °) if they even appear in the regex!

        OK, my first concept was that substrings in the regex are identified and the rows filtered in a first step before applying the regexe to them. Kind of Danny's idea.

        But ... If the regex is decomposed to a query with simpler sub rexes and the Boolean logic is delegated to the SQL engine building a plan....

        Well I have to admit, this could be very efficient for complex regexes

        I tried to look into the PQ documentation, unfortunately it doesn't give much insight, except that substrings with at least 3 characters are needed.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery