madbee has asked for the wisdom of the Perl Monks concerning the following question:

Hello! I am trying to extract the protocol number which is alpha-numeric from a given string. My string is:

 my $str="Study Protocol No. NBXF317N2201"; For this string, I'm trying to extract the term: using the following regular expression:

my ($prot) = $str =~ /Protocol (?:No\.? )?([A-Z0-9]{12})/;

This works fine. However, for some docs, I have the string "Number" instead of No. or No

For this: I tried to use matching within the expression like this:

my ($term) = $str =~ /Protocol (?:(No\.|Number)? )?([A-Z0-9]{12})/;

This just returns "Number" instead of: NBXF317N2201

So, given a string:"Study Protocol (No.|No|Number) NBXF317N2201", how can I modify my expression in order to extract NBXF317N2201 which is always a combination of alpha-and numbers and is always 12 characters long?

Thanks a lot in advance. Regards,madbee

Replies are listed 'Best First'.
Re: Extract pattern from string
by kcott (Archbishop) on Jul 04, 2013 at 06:06 UTC

    G'day madbee,

    "So, given a string:"Study Protocol (No.|No|Number) NBXF317N2201", how can I modify my expression in order to extract NBXF317N2201 which is always a combination of alpha-and numbers and is always 12 characters long?"

    If that's really all you need to do, just anchor your regex to the end of the string:

    $ perl -Mstrict -Mwarnings -E ' my $str = "Study Protocol (No.|No|Number) NBXF317N2201"; my ($term) = $str =~ /([A-Z0-9]{12})$/; say $term; ' NBXF317N2201

    -- Ken

Re: Extract pattern from string
by hdb (Monsignor) on Jul 04, 2013 at 05:54 UTC

    Your inner parantheses around (No\.|Number) are capturing. If you make them non-capturing (?:No\.|Number) you should be fine.

      Thanks for your response. That actually did not work. I modified the expression to this:

      $str =~ /Protocol ((?:No\.|Number?) )?([A-Z0-9]{12})/;

      The output I am now getting is "No.". Did I get your suggestion right? Thanks!

        I tried this:

        my $str="Study Protocol Number NBXF317N2201"; my ($term) = $str =~ /Protocol (?:(?:No\.|Number)? )?([A-Z0-9]{12})/; print $term;

        Another question: do you mean the . after No to be optional? Then it has to be

        my $str="Study Protocol Number NBXF317N2201"; my ($term) = $str =~ /Protocol (?:(?:No\.?|Number) )?([A-Z0-9]{12})/; print $term;

        UPDATE: Having thought about it, I would do the following which IMHO is simpler to read:

        use strict; use warnings; for my $str ( "Study Protocol Number NBXF317N2201", "Study Protocol No. NBXF317N2201", "Study Protocol NBXF317N2201", "Study Protocol No NBXF317N2201" ) { my ($term) = $str =~ /Protocol (?:No\. |No |Number |)([A-Z0-9] +{12})/; print "$term\n"; }
        That actually did not work. I modified the expression to this:
            $str =~ /Protocol ((?:No\.|Number?) )?([A-Z0-9]{12})/;

        BTW: The reason the modified regex didn't work is that you just moved the extraneous and confounding capturing parentheses from the inside to the outside of the group instead of making them non-capturing:  (?:(No\.|Number)? )? to  ((?:No\.|Number?) )?