Re: RegEx Headaches
by 5mi11er (Deacon) on Jun 19, 2013 at 18:01 UTC
|
x /(\d+)\.(\d+)\.xml/
-Scott | [reply] [Watch: Dir/Any] [d/l] |
|
Yes, you answered the question correctly. Unfortunately, it was the wrong question. (That's my fault, not yours.)
I should have added: the filename can have _one or more_ digit groups separated by periods in the middle. I want to extract them ALL!
So, for example:
ActionLogs.1.2.3.4.5.6.7.8.9.xml
Should yield
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
I thought the g would do that, but I guess not.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
DB<1> $_ = 'ActionLogs.1.2.3.4.5.6.7.8.9.xml'
DB<2> x split ' ', tr/.[a-zA-Z]/ /dr
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
DB<3>
Update: The brackets are unnecessary; tr/.a-zA-Z/ /dr works just as well. | [reply] [Watch: Dir/Any] [d/l] |
Re: RegEx Headaches
by bart (Canon) on Jun 19, 2013 at 18:41 UTC
|
@numbers = /(\d+)(?=\.)(?=.*\.xml$)/g
This uses lookahead, which will match the rest of the string, but not consume it.
If you don't need the extra check, you can just do
@numbers = /(\d+)/g
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
DB<23> x $_
0 'ActionLogs.1.2.3.4.5.6.7.8.9.xml'
DB<24> x /(\d+\.)/g
0 1.
1 2.
2 3.
3 4.
4 5.
5 6.
6 7.
7 8.
8 9.
but:
DB<25> x /(\d+\.)+/g
0 9.
Can anyone tell me why the quantifier makes it match only the last group? | [reply] [Watch: Dir/Any] [d/l] [select] |
|
/((?:\d+\.)+)/g;
split /(?<=\.)/, $1;
Alternatively, [i]n list context, //g returns a list of matched groupings, or if there are no groupings, a list of matches to the whole regexp
(see Global matching in perlretut) so you could try
my @res = /(?<=\.)\d+\./g;
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: RegEx Headaches
by kcott (Archbishop) on Jun 22, 2013 at 21:06 UTC
|
G'day oryx3,
Welcome to the monastery.
I see a number of solutions that appear to be more complicated than necessary.
You've provided two pieces of sample input with expected output for each.
In both cases, either of these will achieve what you want:
/\.(\d+)/g
/(\d+)\./g
Here's my test:
$ perl -Mstrict -Mwarnings -de 1
Loading DB routines from perl5db.pl version 1.39_09
Editor support available.
Enter h or 'h h' for help, or 'man perldebug' for more help.
main::(-e:1): 1
DB<1> $_ = 'ActionLogs.1.1998.xml'
+
DB<2> x /\.(\d+)/g
+
0 1
1 1998
DB<3> x /(\d+)\./g
+
0 1
1 1998
DB<4> $_ = 'ActionLogs.1.2.3.4.5.6.7.8.9.xml'
+
DB<5> x /\.(\d+)/g
+
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
DB<6> x /(\d+)\./g
+
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
DB<7> q
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
>perl -wMstrict -le
"$_ = 'ActionLogs.1.22.333.4.5.6.7.8.987.xml';
;;
my @digit_groups = m{ \d+ }xmsg;
printf qq{'$_' } for @digit_groups;
"
'1' '22' '333' '4' '5' '6' '7' '8' '987'
| [reply] [Watch: Dir/Any] [d/l] |
|
++ Yes, that works and is less complicated still. :-)
It wouldn't have occurred to me not to use a capture group.
I checked the online docs and found in perlretut - Using regular expressions in Perl - Global matching (after following links from perlre):
In list context, //g returns a list of matched groupings, or if there are no groupings, a list of matches to the whole regexp. [my emphasis]
That seemed new to me (those links are for 5.16.2), so I checked back to the earliest online perldoc version (5.8.8) and, while in a different manpage (http://perldoc.perl.org/5.8.8/perlop.html#Regexp-Quote-Like-Operators) with different wording, that behaviour was current back then:
In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. [my emphasis, again]
I've learned something new. Thankyou.
| [reply] [Watch: Dir/Any] [d/l] |