avim1968 has asked for the wisdom of the Perl Monks concerning the following question:

Hello all

I have a code which uses a pattern and checks if a string is in the correct pattern form. if it does it say yes.

if ($port111 =~ /^(\d{1,2})\/(\d{1,2})\/(\d{1,2})$/) {print "yes port +111=$port111\n";}

allowed strings are 1/1/2 2/3/15 6/10/1 etc... all others are rejected.

I also have strings like those :
ME170-5/2/8-ME172-2/2/6-ME4028
ME172-2/1/2-ME196-1/1/3-ME4002
and i would like to extract the substring that match the pattern . for example :
ME170-5/2/8-ME172-2/2/6-ME4028
would extract 5/2/8 and 2/2/6
i have tried using this code but it does not work :-(.

$string = 'ME170-5/2/8-ME172-2/2/6-ME4028'; @ports = $string =~ /^(\d{1,2})\/(\d{1,2})\/(\d{1,2})$/;

can anyone point be in the right direction?
Avi

P.s also is there a way to get an array with indexed locations of where a matching patterns were found in the string?

Replies are listed 'Best First'.
Re: Extract a pattern from a string
by NetWallah (Canon) on Jun 10, 2012 at 07:14 UTC
    ^ and $ anchor the regex to the beginning and end of the string, so if what you are searching for is NOT at the beginning, or the pattern does not terminate at the end of the string, those anchors should not be used.

    What you are looking for is something like this:

    $ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1, +2})/(\d{1,2})/(\d{1,2})|g){++$x;say "$x Found $1-$2-$3:" . pos($s)}' __OUTPUT__ 1 Found 5-2-8:11 2 Found 2-2-6:23
    I used the "|" as regex delimiter, to avoid the "leaning toothpicks" syndrome - ie to avoid having to escape the "/".

                 I hope life isn't a big joke, because I don't get it.
                       -SNL

      I used the "|" as regex delimiter, to avoid the "leaning toothpicks" syndrome - ie to avoid having to escape the "/".

      But doing that to avoid LTS puts you in danger of succumbing to STD (Straight Toothpick Distemper) the first time you use an  | alternation in your regex. Why not just use a pair of nesting delimiters,  { } for e.g., and be immunized against many of these pathologies?

      $s =~ m{ (\d{1,2}) / (\d{1,2}) / (\d{1,2}) }xmsg
        Agreed. (++)

        vaccines - your best shot at good health. (Immunization slogan).

                     I hope life isn't a big joke, because I don't get it.
                           -SNL

      $ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1, +2})/(\d{1,2})/(\d{1,2})|g){++$x;say "$x Found $1-$2-$3:" . pos($s)}'
      I have changed the code and now i get the index i need
      $ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1, +2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}'
      thank you
      Avi
        This version is slightly more efficent:
        perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1,2 +}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . " POS=" . (pos($s)-l +ength($1)+1)}'
        The reason is that the 'index' operator re-scans the string, while 'pos' uses an existing value, and the 'length' computation is significantly faster than scanning.
        Update: It also does not suffer from the bug ambrus (++) points out below.

                     I hope life isn't a big joke, because I don't get it.
                           -SNL

        Beware with such a use of

        index</i>, it is incorrect for it won't give you the offset you want i +f the substring appears more than once in the input. Eg. <c> $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/10-ME4028"; while ($s=~m|(\d{ +1,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}' 1 Found 5/2/10:7 2 Found 5/2/10:7 $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/1-ME4028"; while ($s=~m|(\d{1 +,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}' 1 Found 5/2/10:7 2 Found 5/2/1:7 $

        Instead, if you really want to know the offsets, then use either the pos or the @- match variable to find where the regular expression has matched:

        $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/10-ME4028"; while ($s=~m|(\d{ +1,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . $-[1]}' 1 Found 5/2/10:6 2 Found 5/2/10:19 $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/1-ME4028"; while ($s=~m|(\d{1 +,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . $-[1]}' 1 Found 5/2/10:6 2 Found 5/2/1:19 $

        However, maybe you don't want to know the positions at all, but instead match the port numbers and dates with a single regular expression that has two captures.

        Also, those newlines and plus signs inside the braces are just a mistake you made when pasting here, right?

      Hi
      thank you very much, i understand now were i was wrong.
      i took your code as a base and modified it a bit and it works great.
      except that the position is off by few chars ??
      from where do you index it?

      I do have a question, why does the matchs come out
      in $1=4 $2=2 $3=5 and not as a single substring "4/2/5"
      is there a way to get it like this?
      thank you
      Avi

Re: Extract a pattern from a string
by davido (Cardinal) on Jun 10, 2012 at 07:25 UTC

    My snippet reads the lines of text from the <DATA> filehandle. You didn't mention where the strings are coming from, so you'll have to adjust accordingly. The regexp needs to be free to match anywhere in the string, so the ^ and $ anchors were hurting you.

    my $re = qr{ ( (?: \d{1,2}/ ){2} \d{1,2} ) }x; while( my $line = <DATA> ) { while( $line =~ m/ $re /gx ) { print "Line: $.\t Column: ", $-[0], "\tMatched: $1\n"; } } __DATA__ ME170-5/2/8-ME172-2/2/6-ME4028 ME172-2/1/2-ME196-1/1/3-ME4002

    It's up to you to decide what information you would like to push onto an array. Perhaps instead of the print, you could use, push @array, [ $., $-[0], $1 ];

    There are a lot of funny looking variables there: $. tells you the line number most recently read from a file. $-[0] tells you the match position of $1. And $1 holds what matched inside the first set of capturing parens within the regular expression.


    Dave

Re: Extract a pattern from a string
by GrandFather (Saint) on Jun 10, 2012 at 08:13 UTC

    In his reply davido mentions the special arrays @- and @+ which store the start and end character indexes for various matched portions of the string being matched. However, most often those indexes are a means to an end and most often Perl provides better means. It's kind of hard for us to help you find those means however because you've not told us what you want to do. If you take a step back and describe the bigger picture we may be able to offer more help.

    True laziness is hard work

      Hi
      The BIG picture is very simple
      i have a list of strings like i showed
      ME170-5/2/8-ME172-2/2/6-ME4028
      ME172-2/1/2-ME196-1/1/3-ME4002
      i need to extract from each string the port numbers ie 5/2/8 or 2/2/6 etc..
      then according to an index of a number found in the string
      i need to select the port with the correct index number.
      for example with the strings above.
      string 1 = ME170-5/2/8-ME172-2/2/6-ME4028
      string 2 = ME172-2/1/2-ME196-1/1/3-ME4002
      if my search number is 172 then for
      string 1 i would get 2/2/6
      and for string 2 i would get 2/1/2
      thank you
      Avi

        After reading all the answers in this topic i have managed
        to create my Perl code to do just what i need
        Thank you ALL :-)
        Avi

        If knowing the position of the port number in the string is necessary in order to be able to work your way back to and extract the index number, there's a more direct way:

        >perl -wMstrict -le "my $dd = qr{ \d{1,2} }xms; ;; my %indices; for my $s ( 'ME170-5/2/8-ME172-2/2/6-ME4028-FOO172-222/2/666-BAR', 'ME172-2/1/2-ME196-1/1/33-ME4002-11/2/3-ME1-1/22/3-ME22', '1-1/2/3', 'ME-9/9/9-ME', ) { push @{ $indices{$1} }, $2 while $s =~ m{ (\d+) - ($dd (?: / $dd){2}) }xmsg; } ;; for my $i (sort { $a <=> $b } keys %indices) { printf qq{index '$i': port(s) '%s' \n}, join q{' '}, @{ $indices{$i} }; } " index '1': port(s) '1/22/3' '1/2/3' index '170': port(s) '5/2/8' index '172': port(s) '2/2/6' '2/1/2' index '196': port(s) '1/1/33' index '4002': port(s) '11/2/3'