newbio has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I want to include a space charcter in the map function below, so that each phrase pattern to be matched in the sentence also includes space characters on its either side. However, it does not work in the form below (although \b in place of \s works but it does not solve the actual problem). Any suggestions?

my $string = join '|', map { "\\s$_\\s" } map { quotemeta } @phrases; $sentence =~ s/($string)/\#$1\#/g;

Thanks.

#Corrected: $phrases replaced by $string

Replies are listed 'Best First'.
Re: map function use
by ikegami (Patriarch) on Aug 13, 2009 at 19:01 UTC

    The problem has nothing to do with map.

    First of all, you store the pattern in $string, but you use $phrase in the substitution. I think the code you posted isn't the code you actually used.

    The first part of the problem you mention is that you want to match the space between "foo" and "bar" in " foo bar" twice. If that was it, you could use (?=) and/or (?<=) to solve the problem.

    However, you also want to replace that space twice. I don't see how you can do that entirely in one substitution.

    One simple solution is to do a prep pass to double up the space:

    my @phrases = qw( foo bar ); my $sentence = " foo bar "; my ($phrases_pat) = map qr/$_/, join '|', map quotemeta, @phrases; $sentence =~ s/($phrases_pat)(\s)($phrases_pat)/$1$2$2$3/g; $sentence =~ s/(\s$phrases_pat\s)/#$1#/g; print("$sentence\n"); # "# foo ## bar #"

    Or maybe you want

    ... $sentence =~ s/($phrases_pat)(\s)($phrases_pat)/$1$2$2$3/g; $sentence =~ s/\s($phrases_pat)\s/#$1#/g; print("$sentence\n"); # "#foo##bar#"

    If you consider the start and end of the string to be equivalent to spaces, you'll need to handle those specially.

Re: map function use
by moritz (Cardinal) on Aug 13, 2009 at 18:50 UTC
    my @phrases = <foo bar>; print join "|", map { "\\s$_\\s" } map { quotemeta } @phrases;' __END__ \sfoo\s|\sbar\s

    The regex looks fine to me - but without knowing what you match against it's impossible to tell if the regex is appropriate in your case.

      Thanks Moritz. Sure, here is the problem:

      Input sentence: (arf)71 p65(91) H:223 Phospholipase A2 inhibitor ( PLI ) , purified from the blood plasma of the Habu snake ( Trimeresurus flavoviridis ) , was separated into two distinct subunits , PLI-A and PLI-B

      Output tagged sentence: #(arf)71# #p65(91)# #H:223# #Phospholipase A2 inhibitor# ( #PLI# ) , purified from the blood plasma of the Habu snake ( Trimeresurus flavoviridis ) , was separated into two distinct subunits , #PLI-A# and #PLI-B#

      @phrases=('(arf)71', 'p65(91)', 'H:223', 'Phospholipase A2 inhibitor', 'Phospholipase A2', 'Phospholipase', 'PLI-A', 'PLI-B', 'PLI');

        The following works:

        use strict; use warnings; my @phrases=('(arf)71', 'p65(91)', 'H:223', 'Phospholipase A2 inhibito +r', 'Phospholipase A2', 'Phospholipase', 'PLI-A', 'PLI-B', 'PLI'); my $string = join '|', map { quotemeta } @phrases; my $in = "(arf)71 p65(91) H:223 Phospholipase A2 inhibitor ( PLI ) , p +urified from the blood plasma of the Habu snake ( Trimeresurus flavov +iridis ) , was separated into two distinct subunits , PLI-A and PLI-B +"; my $out = "#(arf)71# #p65(91)# #H:223# #Phospholipase A2 inhibitor# ( +#PLI# ) , purified from the blood plasma of the Habu snake ( Trimeres +urus flavoviridis ) , was separated into two distinct subunits , #PLI +-A# and #PLI-B#"; #while($in =~ s/(^|\s)($string)(\s|$)/$1#$2#$3/) {}; while($in =~ s/(^|\s)($string)(\s|$)/$1#$2#$3/g) { pos() = pos() - 1; +}; print "success!!\n" if($in eq $out); print "in: $in\n"; print "out: $out\n";
Re: map function use
by jwkrahn (Abbot) on Aug 13, 2009 at 19:01 UTC

    Perhaps you want something like this:

    my $string = '(?<=\s)(' . join( '|', map "\Q$_", @phrases ) . ')(?=\s) +'; $sentence =~ s/$string/#$1#/g;
Re: map function use
by ig (Vicar) on Aug 13, 2009 at 19:31 UTC
    Any suggestions?
    1. Use $string in your RE instead of $phrases
    2. Explain what you mean by "it does not work"
    3. Explain what your "actual problem" is
Re: map function use
by JavaFan (Canon) on Aug 13, 2009 at 19:03 UTC
    Your resulting regexp, and the code you use to create it, is simpler if you factor out the common \s.
    my $string = join '|', map {quotemeta} @phrases; $sentence =~ s/(\s$string\s)/#$1#/g;
    But I guess your biggest problem is that the regexp uses $phrases, while the join creates $string.
      ... if you factor out the common \s.
      my $string = join '|', map {quotemeta} @phrases; $sentence =~ s/(\s$string\s)/#$1#/g;
      But if  @phrases was something like  qw(foo bar baz) then the final substitution regex would be equivalent to
          $sentence =~ s/(\sfoo|bar|baz\s)/#$1#/g;
      in which a leading space is associated only with  foo and a trailing space only with  baz and nothing with  bar.

      Adding a non-capturing sub-grouping should do the trick:
          $sentence =~ s/(\s(?:$string)\s)/#$1#/g;