awygle has asked for the wisdom of the Perl Monks concerning the following question:

Hello, thanks for taking the time to read my question... So I'm trying to parse some iCal, specifically getting the BYDAY section of a weekly repeating RRULE. I tried the pattern
@days = /BYDAY=([A-Z]+)(?:,([A-Z]+))*;/
on the line
RRULE:FREQ=WEEKLY;BYDAY=TU,TH;UNTIL=20110429T000000;WKST=SU
and it worked fine (days{0} == TU, days{1} == TH). However if I change TU,TH to MO,WE,FR, I only get MO and FR, and if I change it to just TU, then I get TU but also a warning about using uninitialized value days{1}.
My guess is that the * on the nongrouping parentheses returns an undefined or null value if it does not match, and the last match if it matches more than once. Is there a way to do what I want besides knowing that only 5 days exist and cheating?

Replies are listed 'Best First'.
Re: Capturing unknown number of matches
by ikegami (Patriarch) on Jan 21, 2011 at 01:55 UTC
    You have two captures, so you'll two results. You need two steps.
    my ($days) = /BYDAY=([A-Z]+(?:,[A-Z]+)*);/; my @days = split(/,/, $days);

    Same, as one expression:

    my @days = split(/,/, ( /BYDAY=([A-Z]+(?:,[A-Z]+)*);/ )[0]);

    Going the other way, a more generic parser:

    $_ = 'FREQ=WEEKLY;BYDAY=TU,TH;UNTIL=20110429T000000;WKST=SU'; my %attrs; for (split /;/) { my ($k,$v) = split /=/; $attrs{$k} = $v; } my @days = split /,/, $attrs{BYDAY};
Re: Capturing unknown number of matches
by umasuresh (Hermit) on Jan 21, 2011 at 02:08 UTC
    Another solution:
    use strict; my $string ="RRULE:FREQ=WEEKLY;BYDAY=TU,TH;UNTIL=20110429T000000;WKST= +SU"; my($days) = $string =~ /BYDAY=([^;]+);/; my @days = split(/,/, $days); print join("\t",@days);
    I see Ikegami already provided neat solutions!
Re: Capturing unknown number of matches
by ahmad (Hermit) on Jan 21, 2011 at 03:09 UTC

    It's clear that you have fields separated by a semicolon ';' ... So I would recommended using the 'generic solution' provided by ikegami over a regex solution as it's cleaner & make much more sense IMHO.

Re: Capturing unknown number of matches
by AnomalousMonk (Archbishop) on Jan 21, 2011 at 22:59 UTC

    I agree that a generic parser as suggested by ikegami is probably the best approach.

    However, here's a regex solution to the specific problem. Note this assumes only one BYDAY field per line/record; if there are more, all field values are extracted indiscriminately.

    >perl -wMstrict -le "my @tests = ( 'RRULE:FREQ=WEEKLY;BYDAY=TU,TH;UNTIL=20110429T000000;WKST=SU', 'RRULE:FREQ=WEEKLY;BYDAY=TU;UNTIL=20110429T000000;WKST=SU', 'RRULE:FREQ=WEEKLY;BYDAY=TU,TH,FR;UNTIL=20110429T000000;WKST=SU', 'BYDAY=MO,WE;FOO;BYDAY=TU,TH;BAR', ); ;; for my $test (@tests) { my @bydays = $test =~ m{ (?: BYDAY= | \G ,) ([A-Z]+) }xmsg; print qq{@bydays}; } " TU TH TU TU TH FR MO WE TU TH