Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am trying to implement a scientific term extraction pattern in Perl however so far I am unsuccessful:

This is the pattern:
((A|N)+|((A|N)*(NP)?)(A|N)*)N

It should return matches with sequences like: NNNN, NAAN, ANPN etc.

Partially it works. My Perl code looks like:
my @can = ('NNAN', 'BPPAN','ANPN', 'NNAPN'); foreach $value (@can) { if ( $value =~ m/^((A|N)+|((A|N)*(NP)?)(A|N)*)N/ ) { print "match: + $value\n"; } }
However it shouldn't return a match with NNAPN since it should accept only NP sequence. How can I change it in order to work?

Replies are listed 'Best First'.
Re: matching patterns
by sovixi (Novice) on Jun 08, 2008 at 14:41 UTC
    I got it just need to add $ at the end of the line :) if ( $value =~ m/^((A|N)+|((A|N)*(NP)?)(A|N)*)N$/ ) { print "match: $value\n"; }
      Gentle Sovixi,

      Your pattern matches; good. May I offer some tips?

      1. Use the <code> tags to make your code easier to read. (I see you've updated it to do so; good.)
      2. Your pattern now would match the 1-char string 'N'. Is that really what you want?
      3. You're using capturing parenthesis, but not using the value.
      4. When matching single letters, the square brackets read more easily that the '|' in parens.
      5. Use the 'x' modifier to make your code more readable and commentable. Viz
      if ( $value =~ m/^(?: [AN]+ # Either at least one A|N |[AN]* # or an A|N run (?: NP)? # Possibly interrupted by NP [AN]* ) N$/x ){ # With a terminal N. print "match: $value\n"; }
      Something still looks suspicious to me. Your first pattern match is a choice between [AN]+ and [AN]*(NP)?[AN]*. That second choice can match the null string. Seems odd.

      throop