princepawn has asked for the wisdom of the Perl Monks concerning the following question:

I recently developed Regexp::Profanity::US for some very practical (documented) reasons instead using the module Regexp::Common by regexp Gods Abigail-II and TheDamian.

One issue that I have come up against is that the word list I have developed has Perl regex characters in it. However, it would be neat if the least of regexes could express themselves in a SQL-92 compatible way as well.

Thus, the word list could be marked up with general regexes which then transform to SQL or Perl regexes. Or, Perl could parse the regexes and dump them out in SQL format when necessary.... does anyone have any idea how to mark up a list of regexes so that they can be used by SQL-92 and Perl?

Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality

  • Comment on regular expressions: from general to Perl to SQL

Replies are listed 'Best First'.
Re: regular expressions: from general to Perl to SQL
by Aristotle (Chancellor) on Aug 14, 2003 at 21:15 UTC

    SQL only knows placeholders equivalent to .* and .?. Anything more advanced is not doable in standards compliant SQL; though your specific database's dialect may support such expressions. Check the fine manual.

    Update: that's ., not .?. Oops.

    Makeshifts last the longest.

      Yeah, anything more advanced is not necessary for the regular expression task at hand... portability to Perl and SQL is more important than doing nose-bleed things with Perl...

      I may be forced to whip something up... or maybe japhy has something unreleased.

      Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality

        But you do have \b and \s* in your source.. those can't be emulated in SQL. In any case, since the SQL wildcards are % and _ (corresponding to .* and ., respectively), they're easy to replace to produce Perl regex syntax. So I'd store the profanity words with SQL wildcards in the definition and then
        my %rx_wild_for = ( '%' => '.*', '_' => '.' ); $profane = s/[%_]/$rx_wild_for{$1}/g;
        or something.

        Makeshifts last the longest.

        The simplest solution that springs to my mind is to store the SQL-compatible patterns and do a simple search-and-replace to translate them into Perl regex as needed. If there are truly only 2 different matching chars (equiv to .* and .?), this is a fairly trivial task.