in reply to Re^3: Documentation of REGEXP support in DBD::SQLite?
in thread Documentation of REGEXP support in DBD::SQLite?

Regular expressions can be implemented in databases with quite a bit of success. I use them for some stuff in PostgreSQL and it is blazingly fast. It shouldn't matter for complicated the regular expressions are, the simple fact that the database software only has to read a (relatively) small index file from disk instead of everything in the table (full table scan) should still speed up the search. And many databases also keep frequently used indexes in RAM.

Generally, i think of SQLite more of a desktop-type database that is integrated into a single application with a relatively small amount of data. For serious data crunching, i always choose a serious standalone database like PostgreSQL (hey, it's the same price as SQLite).

PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Also check out my sisters artwork and my weekly webcomics
  • Comment on Re^4: Documentation of REGEXP support in DBD::SQLite?

Replies are listed 'Best First'.
Re^5: Documentation of REGEXP support in DBD::SQLite?
by LanX (Saint) on Nov 20, 2024 at 15:05 UTC
    > It shouldn't matter for complicated the regular expressions are,

    Sorry that's bordering at nonsense hubris.

    To take advantage of the index you need to identify character sequences in the regex.° (And erix even showed that it only works for 3+ characters in PG.)

    Now, while it's possible to identify those sequences at the compile phase - at least in Perl using re - one need's to know which ones are

    • mandatory
    • or optional
    • or belonging to more complicated AND/OR clauses.

    And regexes are Turing complete.

    It might be possible with sufficient work involved, but I doubt it's already done.

    You are more than welcome to prove me wrong. Even with a test/benchmark in PG.

    Claiming that the complexity doesn't matter is really quite a bold statement.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

    °) if they even appear in the regex!

      OK, my first concept was that substrings in the regex are identified and the rows filtered in a first step before applying the regexe to them. Kind of Danny's idea.

      But ... If the regex is decomposed to a query with simpler sub rexes and the Boolean logic is delegated to the SQL engine building a plan....

      Well I have to admit, this could be very efficient for complex regexes

      I tried to look into the PQ documentation, unfortunately it doesn't give much insight, except that substrings with at least 3 characters are needed.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery