RuneK has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,
I need to do a pattern match on a string. The string contains the following:
CCIIIIIIIIIII
Where C is a Character and I is an integer. So the sting starts with 2 chars and followed by 9 numbers. The rest of the string can contain any combination of characters, numbers and/or else. The string could also just contain the above string. So my question is how do I compose an expression to grep this substring from the string?
Thanks, Rune

Replies are listed 'Best First'.
Re: Newbie reg.exp question!
by broquaint (Abbot) on Nov 13, 2002 at 12:57 UTC
    How about
    my $string = "PM123456789some more text"; my($chars, $nums, $rest) = $string =~ m{ ^ # start of the string ( [a-z]{2} ) # capture 2 chars ( \d{9} ) # capture 9 digits ( .* ) # capture the rest \z # end of the string }ix; print "chars: $chars", $/, "nums: $nums", $/, "rest: $rest", $/; __output__ chars: PM nums: 123456789 rest: some more text
    See man perlretut and man perlre for more info.
    HTH

    _________
    broquaint

Re: Newbie reg.exp question!
by robartes (Priest) on Nov 13, 2002 at 12:58 UTC
    Well, if it's always the first 11 characters you want, there's no need to use a regexp. Use substr instead:
    use strict; my $string="AB123456789this_code_is_untested"; my $match=substr $string, 0, 11;
    If you really want to use a regexp (which is less efficient), use something like:
    $string=~/^([a-zA-Z]{2}\d{9}).*/; print $1;

    CU
    Robartes-

      Hi again,
      The 2 chars and 9 digits code I got the last time has taken on a life of it's own. It now seems that there is not only one possible instance of the substring in a string, but possible more. The code I'm using to locate the sunstring is the following:

      ($number) = $line =~ m{ ( [A-Z]{2}\d{9} ) }ix;

      The $line can now contain more substings. How do I ensure that the reg.xp. get's all the instances?. I cannot do a split because the data is entered by mortal users who tend to have no sense of order. So a string could look like:

      "So I found SP236645744 in reference ED331234576, but not
      in IJ558897567."

      Hope you Gurus can help,
      Rune
        In that case, you would use a slightly modified version of the regexp:
        use strict; my $string="this_PM123456789_code_is_untested"; $string=~/([a-zA-Z]{2}\d{9})/; print $1;
        This will find the first occurence of your string.

        CU
        Robartes-

        almost the same as above:

        my($chars_nums) = $string =~ m{ ( [a-z]{2}\d{9} ) # capture 2 chars + 9 digits }ix;
        Hi again, Thanks anyway, but I found a working solution my self. Like this:

        (@instances) = $line =~ m/[A-Z]{2}\d{11}/g; foreach $elem (@instances) { print $elem . ","; }

        It gives me every occurance of the substring in the string.
        BR,
        Rune