in reply to Re^6: Interpolating subroutine call in SQL INSERT INTO SELECT statement
in thread Interpolating subroutine call in SQL INSERT INTO SELECT statement

When you get YAPE::Regex::Explain installed the result for your 'old' regex should be

The regular expression: (?-imsx:&[^(amp;)|(lt;)|(gt;)]) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- [^(amp;)|(lt;)|(gt;) any character except: '(', 'a', 'm', 'p', ] ';', ')', '|', '(', 'l', 't', ';', ')', '|', '(', 'g', 't', ';', ')' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
and for mine
The regular expression: (?-imsx:(&(?!(?:amp|lt|gt);)|[><])) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- amp 'amp' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- lt 'lt' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- gt 'gt' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ; ';' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- [><] any character of: '>', '<' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
and your new regex
The regular expression: (?-imsx:&(?!(amp;)|(lt;)|(gt;))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- & '&' ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- amp; 'amp;' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- lt; 'lt;' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ( group and capture to \3: ---------------------------------------------------------------------- gt; 'gt;' ---------------------------------------------------------------------- ) end of \3 ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
poj

Replies are listed 'Best First'.
Re^8: Interpolating subroutine call in SQL INSERT INTO SELECT statement
by shadowsong (Pilgrim) on Sep 01, 2015 at 10:22 UTC

    poj,

    I just managed to get the YAPE::Regex::Explain module downloaded and installed via cpanm (turned out I had some firewall/proxy issues to resolve). This is an awesome module and I'll definitely be using it more in future.

    Question: waaayy back when I did compiler design, we used a tool; a lexical analyzer generator called JLex (which was a Java implementation of Lex) wherein we leveraged its built in scanner to generate tokens from an input stream. Now, the documentation for it said we generally get better performance by building specification for tokens using the longest possible RegExps. So, when matching a particular expression the more of the pattern we can provide, the better. Suffice to say instead of providing:

    (amp)|(lt)|(gt);

    We ought to provide:

    (amp;)|(lt;)|(gt;)

    What are your thoughts on this? I'm not all that well versed with Perl (or how its RegExp engine functions) so any tips would be most appreciated.

    Thanks
    shadowsong