nikolay has asked for the wisdom of the Perl Monks concerning the following question:

Hi. How do i use words excepstions list (like [^qwe|asd]) -- similar to characters' one ([^ghjk])? -- I try to write a script, that would exchange word parts, that contains sign '-', where these parts do not contain (exception) given lists? For example, in the code below
$z='Web-developer, perl-program, explicit-element, function-call, 2-x +speed.'; print "|$z|\n---\n"; $vrm='(?^ui:(\W)([^(\d|web)]+)-([^(proramm|call)]+)) "1>$1<3>$3< 2> +$2<"'; @bz=split "\t", $vrm; for( $i=0; $i<$#bz; $i+=2 ){ while( $z=~s#$bz[$i]#$bz[$i+1]#g ){ print "|$z|\n"; # <STDIN>; } }

i want that each part in the 2-word combination, except 'explicit-element', will remain the same, and only 'explicit-element' be turned to 'element explicit' -- because their parts are listed in lists: 'Web-developer' remains the same because its first part 'Web' is in the regular expression, before the sign '-', same for '2-x', whereas 'perl-program' and 'function-call' second parts ('program' and 'call') are listed in the regular expression after sign '-'.

So, what to do in PERL w/ the exception list for words? Thank you for any advance.

Replies are listed 'Best First'.
Re: RegExp: words exceptions list similar to characters' one.
by Athanasius (Cardinal) on Jun 29, 2016 at 07:37 UTC

    Hello nikolay,

    I would store the exceptions in two hashes, then use exists to test whether a given word should be excluded. I begin by splitting the input string on whitespace. Note the presence of a capture group in the regular expression given to split: the captured string (in this case, the whitespace) is added to the list returned by split, to enable the string to be reassembled correctly after the substitutions have been performed.

    #! perl use strict; use warnings; my %exclude_left = map { lc $_ => undef } qw( web 2 ); my %exclude_right = map { lc $_ => undef } qw( program call ); my $z = 'Web-developer, perl-program, explicit-element, function-call, + 2-x speed.'; my @phrases = split /(\s+)/, $z; for (@phrases) { # 1 2 3 <-- capture groups if (/ ( (\w+) - (\w+) ) /x) { my ($phrase, $left, $right) = ($1, $2, $3); s/$phrase/$right-$left/ unless exists $exclude_left {lc $left +} || exists $exclude_right{lc $right +}; } } print '|', join('', @phrases), "|\n";

    Output:

    17:36 >perl 1667_SoPW.pl |Web-developer, perl-program, element-explicit, function-call, 2-x spe +ed.| 17:36 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      I like the hash-lookup approach, but nikolay seems to want to exclude digit groups rather than just the explicit  "2" substring. (I admit this is a bit difficult to divine given the pseudo-regex provided in the OP.) So, a substring like  "23-x" in the input string would become  "x-23" in the output string. Again, my guess is that this is not what the OPer wants.


      Give a man a fish:  <%-{-{-{-<

        Ah-ha. Insufficient test cases :(

Re: RegExp: words excepstions list similar to characters' one.
by AnomalousMonk (Archbishop) on Jun 29, 2016 at 20:35 UTC

    Here's an approach that I think is more general, although I have my doubts about readability/maintainability. It seems much more verbose, but that's mainly due to the testing framework. (Tested under Perl version 5.8.9.)

    File exclude_words_1.pl:

    Output:


    Give a man a fish:  <%-{-{-{-<

Re: RegExp: words excepstions list similar to characters' one.
by Anonymous Monk on Jun 29, 2016 at 08:37 UTC
    #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1166843 use strict; use warnings; my $z = 'Web-developer, perl-program, explicit-element, function-call, + 2-x speed.'; print $z; $z =~ s/\b (?!(?:web|2)-) (\w+)- (?!(?:program|call)\b) (\w+)\b /$2 $1/gix; print $z;

      Thank you very much, gods! -- Just what i looked for!

      At this i close the topic!