in reply to Please help with Regexp::Common

You might try to trim the boundary assertions off of the stringized Regexp object (sorry for all the wrap-around):

c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common; ;; print qq{$RE{profanity}}; print qq{A: match '$1'} if 'xxxpissxxx' =~ m{ ($RE{profanity}) }xms; ;; print '--------'; (my $erp = $RE{profanity}) =~ s{ \A \Q(?:\b\E (.*) \Q\b)\E \z }{$1}xm +s; print qq{'$erp'}; ;; print qq{B: match '$1'} if 'xxxpissxxx' =~ m{ ($erp) }xms; " (?:\b(?:(?:piss(?:\ take|\-take|take|e(?:rs|[srd])|ing|y)?|quims?|shit +(?:t(?:e(?:rs|[dr])|ing|y)|e(? :rs|[sdry])|ing|[se])?|t(?:urds?|wats?)|wank(?:e(?:rs|[rd])|ing|s)?|a( +?:rs(?:e(?:\ hole|\-hole|hole| [sd])|ing|e)|ss(?:\ holes?|\-holes?|ed|holes?|ing))|b(?:ull(?:\ shit(? +:t(?:e(?:rs|[dr])|ing)|s)?|\-s hit(?:t(?:e(?:rs|[dr])|ing)|s)?|shit(?:t(?:e(?:rs|[dr])|ing)|s)?)|low( +?:\ jobs?|\-jobs?|jobs?))|c(?: ock(?:\ suck(?:ers?|ing)|\-suck(?:ers?|ing)|suck(?:ers?|ing))|rap(?:p( +?:e(?:rs|[rd])|ing|y)|s)?|u(?: nts?|m(?:ing|ming|s)))|dick(?:\ head|\-head|ed|head|ing|less|s)|f(?:uc +k(?:ed|ing|s)?|art(?:e[rd]|ing |[sy])?|eltch(?:e(?:rs|[rsd])|ing)?)|ha(?:rd[\-\ ]?on|lf(?:\ a[sr]|\-a +[sr]|a[sr])sed)|m(?:other(?:\ fuck(?:ers?|ing)|\-fuck(?:ers?|ing)|fuck(?:ers?|ing))|uth(?:a(?:\ fuck +(?:ers?|ing|[aaa])|\-fuck(?:er s?|ing|[aaa])|fuck(?:ers?|ing|[aaa]))|er(?:\ fuck(?:ers?|ing)|\-fuck(? +:ers?|ing)|fuck(?:ers?|ing)))| erde?)))\b) -------- '(?:(?:piss(?:\ take|\-take|take|e(?:rs|[srd])|ing|y)?|quims?|shit(?:t +(?:e(?:rs|[dr])|ing|y)|e(?:rs| [sdry])|ing|[se])?|t(?:urds?|wats?)|wank(?:e(?:rs|[rd])|ing|s)?|a(?:rs +(?:e(?:\ hole|\-hole|hole|[sd] )|ing|e)|ss(?:\ holes?|\-holes?|ed|holes?|ing))|b(?:ull(?:\ shit(?:t(? +:e(?:rs|[dr])|ing)|s)?|\-shit( ?:t(?:e(?:rs|[dr])|ing)|s)?|shit(?:t(?:e(?:rs|[dr])|ing)|s)?)|low(?:\ +jobs?|\-jobs?|jobs?))|c(?:ock( ?:\ suck(?:ers?|ing)|\-suck(?:ers?|ing)|suck(?:ers?|ing))|rap(?:p(?:e( +?:rs|[rd])|ing|y)|s)?|u(?:nts? |m(?:ing|ming|s)))|dick(?:\ head|\-head|ed|head|ing|less|s)|f(?:uck(?: +ed|ing|s)?|art(?:e[rd]|ing|[sy ])?|eltch(?:e(?:rs|[rsd])|ing)?)|ha(?:rd[\-\ ]?on|lf(?:\ a[sr]|\-a[sr] +|a[sr])sed)|m(?:other(?:\ fuck (?:ers?|ing)|\-fuck(?:ers?|ing)|fuck(?:ers?|ing))|uth(?:a(?:\ fuck(?:e +rs?|ing|[aaa])|\-fuck(?:ers?|i ng|[aaa])|fuck(?:ers?|ing|[aaa]))|er(?:\ fuck(?:ers?|ing)|\-fuck(?:ers +?|ing)|fuck(?:ers?|ing)))|erde ?)))' B: match 'piss'

Update: Of course, this gets you right back to the Scunthorpe Problem noted above by Paladin!


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Please help with Regexp::Common
by scorpio17 (Canon) on Jan 19, 2017 at 15:50 UTC

    I followed your suggestion and tried this:

    use strict; use Regexp::Common; (my $reg = $RE{profanity}) =~ s{\A \Q(?:\b\E (.*) \Q\b)\E \z}{$1}xms; while ( my $word = <DATA> ) { chomp $word; if ( $word =~ m/$reg/ ) { print "Profanity detected: \"$word\"\n"; } else { print "$word\n"; } } __DATA__ aaaabbbbcccc aaaashitcccc aaaa1234cccc ddddeeeeffff

    This way it will find embedded "bad words" without the need for spaces around them, which is what I wanted. I realize the logic in requiring the word boundaries. But I think the fact that $RE{num}{int} finds embedded numbers made me assume that $RE{profanity} should work the same way, or else there might be a switch to toggle the behavior one way or the other.

    The reason I need this is to generate temporary (one-use) passwords (like when someone requests a password reset on a website). The generated password should, ideally, be a jumble of random letters and/or numbers, but I don't want to accidentally send someone a password with an "obvious" obscenity embedded, so a simple filter like this is helpful.

    Thanks!

      You might consider adding a test to check if the expected alteration to the original regex was successful. The  \Q(?:\b\E and  \Q\b)\E parts of the substitution are rather fragile IMO and may break if the maintainer(s) of Regexp::Common ever change his/her/their notion of what a proper profane regex should look like.

      c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common; ;; (my $reg = $RE{profanity}) =~ s{\A \Q(?:\b\E (.*) \Q\b)\E \z}{$1}xms or die 'profanity anchor trim failed'; ;; print qq{bad: '$1'} if 'Matsushita' =~ m{ ($reg) }xms; " bad: 'shit'


      Give a man a fish:  <%-{-{-{-<

      Shouldn't you be generating passwords that do not contain any words?

        Shouldn't you be generating passwords that do not contain any words?

        https://xkcd.com/936/

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        ... passwords that do not contain any words ...

        Isn't that a bit like the CRM 114 Discriminator strategic communications security system, which for absolute top security was designed not to receive any messages...?


        Give a man a fish:  <%-{-{-{-<