Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi , being new to Perl I'm struggling for a simple explaination on how something like the following code would work.
$text = "The dog is black cat is white and the fox does not like the c +ow or pig"; if ($text =~ /(dog|cow|pig)/) { print "match found" }
Does the match quit once it finds the first occurence of one of the alternatives ? e.g. in this case it finds a match on dog so exits
then If I add /g
/(dog|cat|fox|cow|pig)/g
Will that then try and match each alternative in the string and then return ?
What happens if I wanted to build a regular expression but have it continue to search the string for the other alternatives even if the first is matched e.g. I may wish to split the above string up based on the alternatives provided , say put a new line in just before where a match is found. So I end up with the following output
The dog is black cat is white and the fox does not like the cow or pig
Can I do that using the alternation (dog|cat|fox|cow|pig) ?

Replies are listed 'Best First'.
Re: Understanding alternation
by g0n (Priest) on Feb 03, 2006 at 15:06 UTC
    'Does the match quit once it finds the first occurence of one of the alternatives?'
    Yes.

    'Will that then try and match each alternative in the string and then return?'
    Sort of. It will keep looking through the string, even after it's found a match, until it doesn't find any more matches. It's a fine distinction, but it doesn't look through the string for the first, then look through again for the second etc, AFAIK. Note that in the context you are using it, it will just return true, not a list of matches.

    'Can I do that using the alternation ..'
    Yes, like this:

    $text = "The dog is black cat is white and the fox does not like the c +ow or pig"; $text =~ s/(dog|cow|cat|pig)/\n$1/g; print $text;

    ($1 contains the value matched by the pattern, so you are replacing each match with itself, prepended by \n).

    --------------------------------------------------------------

    "If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."
    John Brunner, "The Shockwave Rider".

      I'd throw a couple of \b's in there, so it only matches on word boundaries. eg:
      $text =~ s/\b(dog|cat|fox|cow|pig)\b/\n$1/g;
      Otherwise, you'd match stuff like "cowardly" and "pigheaded" ;)
      Ok seems fairly straightforward thanks for the help
Re: Understanding alternation
by wfsp (Abbot) on Feb 03, 2006 at 15:57 UTC
    At a slight tangent from your question.

    In list context the regex returns all the matches.

    #!/bin/perl5 use strict; use warnings; my $text = "The dog is black cat is white and the fox does not like th +e cow or pig"; my (@array) = $text =~ /(cow|dog|pig)/g; print "@array\n"; __DATA__ ---------- Capture Output ---------- > "C:\Perl\bin\perl.exe" _new.pl dog cow pig > Terminated with exit code 0.
Re: Understanding alternation
by glasswalk3r (Friar) on Feb 03, 2006 at 15:47 UTC

    For the sake of performance, you should switch REGEX alternation with short-circuit alternation:

    my $text = 'The dog is black cat is white and the fox does not like th +e cow or pig'; print 'match found', "\n" if ( $text =~ /dog/ || /cow/ || /pig/ );

    This works fine for simple patterns like you're using. The Camel books has a nice explanation for that in Common Pratices chapter.

    Alceu Rodrigues de Freitas Junior
    ---------------------------------
    "You have enemies? Good. That means you've stood up for something, sometime in your life." - Sir Winston Churchill
      glasswalk3r,
      Your code doesn't do what you think it does. You are saying if $text contains dog or $_ contains cow or $_ contains pig. This can be solved 1 of 2 ways.
      local $_ = 'The dog is black cat is white and the fox does not like th +e cow or pig'; print "match found\n" if /dog/ || /cow/ || /pig/; # or by explicitly stating print "match found\n" if $text =~ /dog/ || $text =~ /cow/ || $text =~ +/pig/;
      It is worth noting that alternation in regexen can be expensive but demerphq's patch to bleed perl (and hopefully the recently released 5.8.8) can make it much less so.

      Update: At bobf's request, I am adding the following note as FYI.
      The binding operator =~ has a higher precedence than || so /dog/ is seen by itself. Perl's do what you mean (DWYM) attitude allows that to be a valid expression by assuming you meant the binding operator with the default variable $_. See perlop for more information on operator precedence and perlvar for more info on variables.

      Additionally, The order in which place the alternation should represent the order the items are most likely to appear. This is because the || short-circuits when it knows at least 1 condition is true. Use && when you want all conditions to be met. Since perl, like math, uses precedence in order to know which order things should be evaluated - be sure to read up in perlop. There are lower precedence versions of || and && (or, and) but nothing higher than that of the binding operator.

      Cheers - L~R