Extract a pattern from a string

avim1968 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Extract a pattern from a string by NetWallah (Canon) on Jun 10, 2012 at 07:14 UTC
^ and $ anchor the regex to the beginning and end of the string, so if what you are searching for is NOT at the beginning, or the pattern does not terminate at the end of the string, those anchors should not be used. What you are looking for is something like this: `$ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m\|(\d{1, +2})/(\d{1,2})/(\d{1,2})\|g){++$x;say "$x Found $1-$2-$3:" . pos($s)}' __OUTPUT__ 1 Found 5-2-8:11 2 Found 2-2-6:23` [download] I used the "\|" as regex delimiter, to avoid the "leaning toothpicks" syndrome - ie to avoid having to escape the "/". I hope life isn't a big joke, because I don't get it. -SNL	[reply] [d/l]
Re^2: Extract a pattern from a string by AnomalousMonk (Archbishop) on Jun 10, 2012 at 17:42 UTC
I used the "\|" as regex delimiter, to avoid the "leaning toothpicks" syndrome - ie to avoid having to escape the "/". But doing that to avoid LTS puts you in danger of succumbing to STD (Straight Toothpick Distemper) the first time you use an `\|` alternation in your regex. Why not just use a pair of nesting delimiters, `{ }` for e.g., and be immunized against many of these pathologies? `$s =~ m{ (\d{1,2}) / (\d{1,2}) / (\d{1,2}) }xmsg` [download]	[reply] [d/l] [select]
Re^3: Extract a pattern from a string by NetWallah (Canon) on Jun 11, 2012 at 14:01 UTC
Agreed. (++) vaccines - your best shot at good health. (Immunization slogan). I hope life isn't a big joke, because I don't get it. -SNL	[reply]
Re^2: Extract a pattern from a string by avim1968 (Acolyte) on Jun 10, 2012 at 11:34 UTC
`$ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m\|(\d{1, +2})/(\d{1,2})/(\d{1,2})\|g){++$x;say "$x Found $1-$2-$3:" . pos($s)}'` [download] I have changed the code and now i get the index i need `$ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m\|(\d{1, +2}/\d{1,2}/\d{1,2})\|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}'` [download] thank you Avi	[reply] [d/l] [select]
Re^3: Extract a pattern from a string by NetWallah (Canon) on Jun 10, 2012 at 14:46 UTC
This version is slightly more efficent: `perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m\|(\d{1,2 +}/\d{1,2}/\d{1,2})\|g){++$x;say "$x Found $1:" . " POS=" . (pos($s)-l +ength($1)+1)}'` [download] The reason is that the 'index' operator re-scans the string, while 'pos' uses an existing value, and the 'length' computation is significantly faster than scanning. Update: It also does not suffer from the bug ambrus (++) points out below. I hope life isn't a big joke, because I don't get it. -SNL	[reply] [d/l]
Re^4: Extract a pattern from a string by avim1968 (Acolyte) on Jun 11, 2012 at 05:35 UTC
Re^5: Extract a pattern from a string by NetWallah (Canon) on Jun 11, 2012 at 13:54 UTC
Some notes below your chosen depth have not been shown here
Re^3: Extract a pattern from a string by ambrus (Abbot) on Jun 10, 2012 at 22:14 UTC
Beware with such a use of `index</i>, it is incorrect for it won't give you the offset you want i +f the substring appears more than once in the input. Eg. <c> $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/10-ME4028"; while ($s=~m\|(\d{ +1,2}/\d{1,2}/\d{1,2})\|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}' 1 Found 5/2/10:7 2 Found 5/2/10:7 $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/1-ME4028"; while ($s=~m\|(\d{1 +,2}/\d{1,2}/\d{1,2})\|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}' 1 Found 5/2/10:7 2 Found 5/2/1:7 $` [download] Instead, if you really want to know the offsets, then use either the `pos` or the `@-` match variable to find where the regular expression has matched: `$ perl -E 'my $s="ME170-5/2/10-ME172-5/2/10-ME4028"; while ($s=~m\|(\d{ +1,2}/\d{1,2}/\d{1,2})\|g){++$x;say "$x Found $1:" . $-[1]}' 1 Found 5/2/10:6 2 Found 5/2/10:19 $ perl -E 'my $s="ME170-5/2/10-ME172-5/2/1-ME4028"; while ($s=~m\|(\d{1 +,2}/\d{1,2}/\d{1,2})\|g){++$x;say "$x Found $1:" . $-[1]}' 1 Found 5/2/10:6 2 Found 5/2/1:19 $` [download] However, maybe you don't want to know the positions at all, but instead match the port numbers and dates with a single regular expression that has two captures. Also, those newlines and plus signs inside the braces are just a mistake you made when pasting here, right?	[reply] [d/l] [select]
Re^4: Extract a pattern from a string by avim1968 (Acolyte) on Jun 11, 2012 at 04:03 UTC
Re^2: Extract a pattern from a string by avim1968 (Acolyte) on Jun 10, 2012 at 08:53 UTC
Hi thank you very much, i understand now were i was wrong. i took your code as a base and modified it a bit and it works great. except that the position is off by few chars ?? from where do you index it? I do have a question, why does the matchs come out in $1=4 $2=2 $3=5 and not as a single substring "4/2/5" is there a way to get it like this? thank you Avi	[reply]
Re^3: Extract a pattern from a string by Anonymous Monk on Jun 10, 2012 at 09:00 UTC
Because that is how the pattern was written, each () corresponds to $n, so the first () is $1 the second is $2 and so on. Read http://perldoc.perl.org/perlintro.html#Parentheses-for-capturing, perlrequick and/or something from Tutorials	[reply]
Re^4: Extract a pattern from a string by avim1968 (Acolyte) on Jun 10, 2012 at 10:01 UTC
Re^4: Extract a pattern from a string by avim1968 (Acolyte) on Jun 10, 2012 at 09:28 UTC
Re^5: Extract a pattern from a string by Anonymous Monk on Jun 10, 2012 at 09:38 UTC
Re: Extract a pattern from a string by davido (Cardinal) on Jun 10, 2012 at 07:25 UTC
My snippet reads the lines of text from the `<DATA>` filehandle. You didn't mention where the strings are coming from, so you'll have to adjust accordingly. The regexp needs to be free to match anywhere in the string, so the `^` and `$` anchors were hurting you. `my $re = qr{ ( (?: \d{1,2}/ ){2} \d{1,2} ) }x; while( my $line = <DATA> ) { while( $line =~ m/ $re /gx ) { print "Line: $.\t Column: ", $-[0], "\tMatched: $1\n"; } } __DATA__ ME170-5/2/8-ME172-2/2/6-ME4028 ME172-2/1/2-ME196-1/1/3-ME4002` [download] It's up to you to decide what information you would like to push onto an array. Perhaps instead of the print, you could use, `push @array, [ $., $-[0], $1 ];` There are a lot of funny looking variables there: `$.` tells you the line number most recently read from a file. `$-[0]` tells you the match position of `$1`. And `$1` holds what matched inside the first set of capturing parens within the regular expression. Dave	[reply] [d/l] [select]
Re: Extract a pattern from a string by GrandFather (Saint) on Jun 10, 2012 at 08:13 UTC
In his reply davido mentions the special arrays @- and @+ which store the start and end character indexes for various matched portions of the string being matched. However, most often those indexes are a means to an end and most often Perl provides better means. It's kind of hard for us to help you find those means however because you've not told us what you want to do. If you take a step back and describe the bigger picture we may be able to offer more help. True laziness is hard work	[reply]
Re^2: Extract a pattern from a string by avim1968 (Acolyte) on Jun 10, 2012 at 09:41 UTC
Hi The BIG picture is very simple i have a list of strings like i showed ME170-5/2/8-ME172-2/2/6-ME4028 ME172-2/1/2-ME196-1/1/3-ME4002 i need to extract from each string the port numbers ie 5/2/8 or 2/2/6 etc.. then according to an index of a number found in the string i need to select the port with the correct index number. for example with the strings above. string 1 = ME170-5/2/8-ME172-2/2/6-ME4028 string 2 = ME172-2/1/2-ME196-1/1/3-ME4002 if my search number is 172 then for string 1 i would get 2/2/6 and for string 2 i would get 2/1/2 thank you Avi	[reply]
Re^3: Extract a pattern from a string by avim1968 (Acolyte) on Jun 10, 2012 at 11:56 UTC
After reading all the answers in this topic i have managed to create my Perl code to do just what i need Thank you ALL :-) Avi	[reply]
Re^3: Extract a pattern from a string by AnomalousMonk (Archbishop) on Jun 10, 2012 at 17:15 UTC
If knowing the position of the port number in the string is necessary in order to be able to work your way back to and extract the index number, there's a more direct way: >perl -wMstrict -le "my $dd = qr{ \d{1,2} }xms; ;; my %indices; for my $s ( 'ME170-5/2/8-ME172-2/2/6-ME4028-FOO172-222/2/666-BAR', 'ME172-2/1/2-ME196-1/1/33-ME4002-11/2/3-ME1-1/22/3-ME22', '1-1/2/3', 'ME-9/9/9-ME', ) { push @{ $indices{$1} }, $2 while $s =~ m{ (\d+) - ($dd (?: / $dd){2}) }xmsg; } ;; for my $i (sort { $a <=> $b } keys %indices) { printf qq{index '$i': port(s) '%s' \n}, join q{' '}, @{ $indices{$i} }; } " index '1': port(s) '1/22/3' '1/2/3' index '170': port(s) '5/2/8' index '172': port(s) '2/2/6' '2/1/2' index '196': port(s) '1/1/33' index '4002': port(s) '11/2/3' [download]	[reply] [d/l]