Capturing unknown number of matches

awygle has asked for the wisdom of the Perl Monks concerning the following question:

Hello, thanks for taking the time to read my question... So I'm trying to parse some iCal, specifically getting the BYDAY section of a weekly repeating RRULE. I tried the pattern
@days = /BYDAY=([A-Z]+)(?:,([A-Z]+))*;/
on the line
RRULE:FREQ=WEEKLY;BYDAY=TU,TH;UNTIL=20110429T000000;WKST=SU
and it worked fine (days{0} == TU, days{1} == TH). However if I change TU,TH to MO,WE,FR, I only get MO and FR, and if I change it to just TU, then I get TU but also a warning about using uninitialized value days{1}.
My guess is that the * on the nongrouping parentheses returns an undefined or null value if it does not match, and the last match if it matches more than once. Is there a way to do what I want besides knowing that only 5 days exist and cheating?

Comment on Capturing unknown number of matches Download Code

Replies are listed 'Best First'.
Re: Capturing unknown number of matches by ikegami (Patriarch) on Jan 21, 2011 at 01:55 UTC
You have two captures, so you'll two results. You need two steps. `my ($days) = /BYDAY=([A-Z]+(?:,[A-Z]+));/; my @days = split(/,/, $days);` [download] Same, as one expression: `my @days = split(/,/, ( /BYDAY=([A-Z]+(?:,[A-Z]+));/ )[0]);` [download] Going the other way, a more generic parser: `$_ = 'FREQ=WEEKLY;BYDAY=TU,TH;UNTIL=20110429T000000;WKST=SU'; my %attrs; for (split /;/) { my ($k,$v) = split /=/; $attrs{$k} = $v; } my @days = split /,/, $attrs{BYDAY};` [download]	[reply] [d/l] [select]
Re: Capturing unknown number of matches by umasuresh (Hermit) on Jan 21, 2011 at 02:08 UTC
Another solution: `use strict; my $string ="RRULE:FREQ=WEEKLY;BYDAY=TU,TH;UNTIL=20110429T000000;WKST= +SU"; my($days) = $string =~ /BYDAY=([^;]+);/; my @days = split(/,/, $days); print join("\t",@days);` [download] I see Ikegami already provided neat solutions!	[reply] [d/l]
Re: Capturing unknown number of matches by ahmad (Hermit) on Jan 21, 2011 at 03:09 UTC
It's clear that you have fields separated by a semicolon ';' ... So I would recommended using the 'generic solution' provided by ikegami over a regex solution as it's cleaner & make much more sense IMHO.	[reply]
Re: Capturing unknown number of matches by AnomalousMonk (Archbishop) on Jan 21, 2011 at 22:59 UTC
I agree that a generic parser as suggested by ikegami is probably the best approach. However, here's a regex solution to the specific problem. Note this assumes only one BYDAY field per line/record; if there are more, all field values are extracted indiscriminately. `>perl -wMstrict -le "my @tests = ( 'RRULE:FREQ=WEEKLY;BYDAY=TU,TH;UNTIL=20110429T000000;WKST=SU', 'RRULE:FREQ=WEEKLY;BYDAY=TU;UNTIL=20110429T000000;WKST=SU', 'RRULE:FREQ=WEEKLY;BYDAY=TU,TH,FR;UNTIL=20110429T000000;WKST=SU', 'BYDAY=MO,WE;FOO;BYDAY=TU,TH;BAR', ); ;; for my $test (@tests) { my @bydays = $test =~ m{ (?: BYDAY= \| \G ,) ([A-Z]+) }xmsg; print qq{@bydays}; } " TU TH TU TU TH FR MO WE TU TH` [download]	[reply] [d/l]