in reply to Extract a pattern from a string

^ and $ anchor the regex to the beginning and end of the string, so if what you are searching for is NOT at the beginning, or the pattern does not terminate at the end of the string, those anchors should not be used.

What you are looking for is something like this:

$ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1, +2})/(\d{1,2})/(\d{1,2})|g){++$x;say "$x Found $1-$2-$3:" . pos($s)}' __OUTPUT__ 1 Found 5-2-8:11 2 Found 2-2-6:23
I used the "|" as regex delimiter, to avoid the "leaning toothpicks" syndrome - ie to avoid having to escape the "/".

             I hope life isn't a big joke, because I don't get it.
                   -SNL

Replies are listed 'Best First'.
Re^2: Extract a pattern from a string
by AnomalousMonk (Archbishop) on Jun 10, 2012 at 17:42 UTC
    I used the "|" as regex delimiter, to avoid the "leaning toothpicks" syndrome - ie to avoid having to escape the "/".

    But doing that to avoid LTS puts you in danger of succumbing to STD (Straight Toothpick Distemper) the first time you use an  | alternation in your regex. Why not just use a pair of nesting delimiters,  { } for e.g., and be immunized against many of these pathologies?

    $s =~ m{ (\d{1,2}) / (\d{1,2}) / (\d{1,2}) }xmsg
      Agreed. (++)

      vaccines - your best shot at good health. (Immunization slogan).

                   I hope life isn't a big joke, because I don't get it.
                         -SNL

Re^2: Extract a pattern from a string
by avim1968 (Acolyte) on Jun 10, 2012 at 11:34 UTC
    $ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1, +2})/(\d{1,2})/(\d{1,2})|g){++$x;say "$x Found $1-$2-$3:" . pos($s)}'
    I have changed the code and now i get the index i need
    $ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1, +2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}'
    thank you
    Avi
      This version is slightly more efficent:
      perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1,2 +}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . " POS=" . (pos($s)-l +ength($1)+1)}'
      The reason is that the 'index' operator re-scans the string, while 'pos' uses an existing value, and the 'length' computation is significantly faster than scanning.
      Update: It also does not suffer from the bug ambrus (++) points out below.

                   I hope life isn't a big joke, because I don't get it.
                         -SNL

        Hi
        I have made those two codes based on what you wrote.
        in the case of multiple identical port numbers in the string.
        i need to know which would be faster/wiser to use ?
        $s=ME170-5/2/8-ME172-5/2/8-ME4028ME172-5/2/8-ME196-5/2/8-ME4002; while ($s=~m/(\d{1,2}\/\d{1,2}\/\d{1,2})/g) {++$r;print "$r Found $1:" + .(pos($s)-length($1)+1) .$nl;}<br>
        _OUTPUT_
        1 Found 5/2/8:7
        2 Found 5/2/8:19
        3 Found 5/2/8:37
        4 Found 5/2/8:49
        $s=ME170-5/2/8-ME172-5/2/8-ME4028ME172-5/2/8-ME196-5/2/8-ME4002; while ($s=~m/(\d{1,2}\/\d{1,2}\/\d{1,2})/g) {++$e;print "$e Found $1:" + .($-[1]+1) .$nl;}<br>
        _OUTPUT_
        1 Found 5/2/8:7
        2 Found 5/2/8:19
        3 Found 5/2/8:37
        4 Found 5/2/8:49
        Thank you
        Avi

      Beware with such a use of

      index</i>, it is incorrect for it won't give you the offset you want i +f the substring appears more than once in the input. Eg. <c> $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/10-ME4028"; while ($s=~m|(\d{ +1,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}' 1 Found 5/2/10:7 2 Found 5/2/10:7 $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/1-ME4028"; while ($s=~m|(\d{1 +,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}' 1 Found 5/2/10:7 2 Found 5/2/1:7 $

      Instead, if you really want to know the offsets, then use either the pos or the @- match variable to find where the regular expression has matched:

      $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/10-ME4028"; while ($s=~m|(\d{ +1,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . $-[1]}' 1 Found 5/2/10:6 2 Found 5/2/10:19 $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/1-ME4028"; while ($s=~m|(\d{1 +,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . $-[1]}' 1 Found 5/2/10:6 2 Found 5/2/1:19 $

      However, maybe you don't want to know the positions at all, but instead match the port numbers and dates with a single regular expression that has two captures.

      Also, those newlines and plus signs inside the braces are just a mistake you made when pasting here, right?

        Hi
        thank you for the warning, I just hit that problem during the run.
        when i got the same port number in a line.
        i'll modify my script again, with your input.
        thank you
        Avi
        p.s.
        all those newlines and plus signs inside the braces are not
        mine but were placed there after pasting the code.
Re^2: Extract a pattern from a string
by avim1968 (Acolyte) on Jun 10, 2012 at 08:53 UTC

    Hi
    thank you very much, i understand now were i was wrong.
    i took your code as a base and modified it a bit and it works great.
    except that the position is off by few chars ??
    from where do you index it?

    I do have a question, why does the matchs come out
    in $1=4 $2=2 $3=5 and not as a single substring "4/2/5"
    is there a way to get it like this?
    thank you
    Avi

        Hi
        thank you for your answer, i have made the needed changes
        and now i get the full pattern.
        Avi

        Hi except that the position of the substring is off by few chars ??
        from where do you index it?
        Avi