[CLOSED] RegExp: words excepstions list similar to characters' one.

nikolay has asked for the wisdom of the Perl Monks concerning the following question:

Hi. How do i use words excepstions list (like [^qwe|asd]) -- similar to characters' one ([^ghjk])? -- I try to write a script, that would exchange word parts, that contains sign '-', where these parts do not contain (exception) given lists? For example, in the code below

$z='Web-developer, perl-program, explicit-element, function-call, 2-x 
+speed.';

print "|$z|\n---\n";

$vrm='(?^ui:(\W)([^(\d|web)]+)-([^(proramm|call)]+))    "1>$1<3>$3< 2>
+$2<"';

@bz=split "\t", $vrm;

for( $i=0; $i<$#bz; $i+=2 ){
    while( $z=~s#$bz[$i]#$bz[$i+1]#g ){
        print "|$z|\n";
#        <STDIN>;
    }
}
[download]

i want that each part in the 2-word combination, except 'explicit-element', will remain the same, and only 'explicit-element' be turned to 'element explicit' -- because their parts are listed in lists: 'Web-developer' remains the same because its first part 'Web' is in the regular expression, before the sign '-', same for '2-x', whereas 'perl-program' and 'function-call' second parts ('program' and 'call') are listed in the regular expression after sign '-'.

So, what to do in PERL w/ the exception list for words? Thank you for any advance.

Comment on [CLOSED] RegExp: words excepstions list similar to characters' one. Select or Download Code

Replies are listed 'Best First'.

Re: RegExp: words exceptions list similar to characters' one.
by Athanasius (Cardinal) on Jun 29, 2016 at 07:37 UTC

Hello nikolay,

I would store the exceptions in two hashes, then use exists to test whether a given word should be excluded. I begin by splitting the input string on whitespace. Note the presence of a capture group in the regular expression given to split: the captured string (in this case, the whitespace) is added to the list returned by split, to enable the string to be reassembled correctly after the substitutions have been performed.

#! perl
use strict;
use warnings;

my %exclude_left  = map { lc $_ => undef } qw( web 2 );
my %exclude_right = map { lc $_ => undef } qw( program call );

my $z = 'Web-developer, perl-program, explicit-element, function-call,
+ 2-x speed.';
my @phrases = split /(\s+)/, $z;

for (@phrases)
{
    #     1 2       3           <-- capture groups
    if (/ ( (\w+) - (\w+) ) /x)
    {
        my ($phrase, $left, $right) = ($1, $2, $3);

        s/$phrase/$right-$left/ unless exists $exclude_left {lc $left 
+} ||
                                       exists $exclude_right{lc $right
+};
    }
}

print '|', join('', @phrases), "|\n";
[download]

Output:

17:36 >perl 1667_SoPW.pl
|Web-developer, perl-program, element-explicit, function-call, 2-x spe
+ed.|

17:36 >
[download]

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^2: RegExp: words exceptions list similar to characters' one.

by AnomalousMonk (Archbishop) on Jun 29, 2016 at 18:12 UTC

I like the hash-lookup approach, but nikolay seems to want to exclude digit groups rather than just the explicit "2" substring. (I admit this is a bit difficult to divine given the pseudo-regex provided in the OP.) So, a substring like "23-x" in the input string would become "x-23" in the output string. Again, my guess is that this is not what the OPer wants.

Give a man a fish: <%-{-{-{-<

[reply]
[d/l]
[select]

Re^3: RegExp: words exceptions list similar to characters' one.

by Anonymous Monk on Jun 29, 2016 at 19:38 UTC

Ah-ha. Insufficient test cases :(

[reply]

Re: RegExp: words excepstions list similar to characters' one.
by AnomalousMonk (Archbishop) on Jun 29, 2016 at 20:35 UTC

Here's an approach that I think is more general, although I have my doubts about readability/maintainability. It seems much more verbose, but that's mainly due to the testing framework. (Tested under Perl version 5.8.9.)

File exclude_words_1.pl: