Re: Extract a pattern from a string
by NetWallah (Canon) on Jun 10, 2012 at 07:14 UTC
|
$ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1,
+2})/(\d{1,2})/(\d{1,2})|g){++$x;say "$x Found $1-$2-$3:" . pos($s)}'
__OUTPUT__
1 Found 5-2-8:11
2 Found 2-2-6:23
I used the "|" as regex delimiter, to avoid the "leaning toothpicks" syndrome - ie to avoid having to escape the "/".
I hope life isn't a big joke, because I don't get it.
-SNL
| [reply] [d/l] |
|
|
I used the "|" as regex delimiter, to avoid the "leaning toothpicks" syndrome - ie to avoid having to escape the "/".
But doing that to avoid LTS puts you in danger of succumbing to STD (Straight Toothpick Distemper) the first time you use an | alternation in your regex. Why not just use a pair of nesting delimiters, { } for e.g., and be immunized against many of these pathologies?
$s =~ m{ (\d{1,2}) / (\d{1,2}) / (\d{1,2}) }xmsg
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
$ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1,
+2})/(\d{1,2})/(\d{1,2})|g){++$x;say "$x Found $1-$2-$3:" . pos($s)}'
I have changed the code and now i get the index i need
$ perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1,
+2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}'
thank you
Avi
| [reply] [d/l] [select] |
|
|
This version is slightly more efficent:
perl -E 'my $s="ME170-5/2/8-ME172-2/2/6-ME4028"; while ($s=~m|(\d{1,2
+}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . " POS=" . (pos($s)-l
+ength($1)+1)}'
The reason is that the 'index' operator re-scans the string, while 'pos' uses an existing value, and the 'length' computation is significantly faster than scanning.
Update: It also does not suffer from the bug ambrus (++) points out below.
I hope life isn't a big joke, because I don't get it.
-SNL
| [reply] [d/l] |
|
|
|
|
|
|
|
index</i>, it is incorrect for it won't give you the offset you want i
+f the substring appears more than once in the input. Eg.
<c>
$ perl -E 'my $s="ME170-5/2/10-ME172-5/2/10-ME4028"; while ($s=~m|(\d{
+1,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}'
1 Found 5/2/10:7
2 Found 5/2/10:7
$ perl -E 'my $s="ME170-5/2/10-ME172-5/2/1-ME4028"; while ($s=~m|(\d{1
+,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . (index($s,$1)+1)}'
1 Found 5/2/10:7
2 Found 5/2/1:7
$
Instead, if you really want to know the offsets, then use either the pos or the @- match variable to find where the regular expression has matched:
$ perl -E 'my $s="ME170-5/2/10-ME172-5/2/10-ME4028"; while ($s=~m|(\d{
+1,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . $-[1]}'
1 Found 5/2/10:6
2 Found 5/2/10:19
$ perl -E 'my $s="ME170-5/2/10-ME172-5/2/1-ME4028"; while ($s=~m|(\d{1
+,2}/\d{1,2}/\d{1,2})|g){++$x;say "$x Found $1:" . $-[1]}'
1 Found 5/2/10:6
2 Found 5/2/1:19
$
However, maybe you don't want to know the positions at all, but instead match the port numbers and dates with a single regular expression that has two captures.
Also, those newlines and plus signs inside the braces are just a mistake you made when pasting here, right?
| [reply] [d/l] [select] |
|
|
|
|
Hi
thank you very much, i understand now were i was wrong.
i took your code as a base and modified it a bit and it works great.
except that the position is off by few chars ??
from where do you index it?
I do have a question, why does the matchs come out
in $1=4 $2=2 $3=5 and not as a single substring "4/2/5"
is there a way to get it like this?
thank you
Avi
| [reply] |
|
|
| [reply] |
|
|
|
|
|
|
Re: Extract a pattern from a string
by davido (Cardinal) on Jun 10, 2012 at 07:25 UTC
|
My snippet reads the lines of text from the <DATA> filehandle. You didn't mention where the strings are coming from, so you'll have to adjust accordingly. The regexp needs to be free to match anywhere in the string, so the ^ and $ anchors were hurting you.
my $re = qr{
(
(?: \d{1,2}/ ){2}
\d{1,2}
)
}x;
while( my $line = <DATA> ) {
while( $line =~ m/ $re /gx ) {
print "Line: $.\t Column: ", $-[0], "\tMatched: $1\n";
}
}
__DATA__
ME170-5/2/8-ME172-2/2/6-ME4028
ME172-2/1/2-ME196-1/1/3-ME4002
It's up to you to decide what information you would like to push onto an array. Perhaps instead of the print, you could use, push @array, [ $., $-[0], $1 ];
There are a lot of funny looking variables there: $. tells you the line number most recently read from a file. $-[0] tells you the match position of $1. And $1 holds what matched inside the first set of capturing parens within the regular expression.
| [reply] [d/l] [select] |
Re: Extract a pattern from a string
by GrandFather (Saint) on Jun 10, 2012 at 08:13 UTC
|
In his reply davido mentions the special arrays @- and @+ which store the start and end character indexes for various matched portions of the string being matched. However, most often those indexes are a means to an end and most often Perl provides better means. It's kind of hard for us to help you find those means however because you've not told us what you want to do. If you take a step back and describe the bigger picture we may be able to offer more help.
True laziness is hard work
| [reply] |
|
|
| [reply] |
|
|
After reading all the answers in this topic i have managed
to create my Perl code to do just what i need
Thank you ALL :-)
Avi
| [reply] |
|
|
>perl -wMstrict -le
"my $dd = qr{ \d{1,2} }xms;
;;
my %indices;
for my $s (
'ME170-5/2/8-ME172-2/2/6-ME4028-FOO172-222/2/666-BAR',
'ME172-2/1/2-ME196-1/1/33-ME4002-11/2/3-ME1-1/22/3-ME22',
'1-1/2/3', 'ME-9/9/9-ME',
) {
push @{ $indices{$1} }, $2
while $s =~ m{ (\d+) - ($dd (?: / $dd){2}) }xmsg;
}
;;
for my $i (sort { $a <=> $b } keys %indices) {
printf qq{index '$i': port(s) '%s' \n},
join q{' '}, @{ $indices{$i} };
}
"
index '1': port(s) '1/22/3' '1/2/3'
index '170': port(s) '5/2/8'
index '172': port(s) '2/2/6' '2/1/2'
index '196': port(s) '1/1/33'
index '4002': port(s) '11/2/3'
| [reply] [d/l] |