Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Excuse a newbie question, but I'm baffled. Assuming I have a single line of XML, how do I pull out all values of the same tag name? ie.
my $s = '<blah>red</blah><blah>white</blah><blah>blue</blah>'; print $s, $/; while (/<blah>(.*)<\/blah>/) { # this RE isn't right... print $1, $/; }
Thanks for any help you can provide.

Replies are listed 'Best First'.
Re: non-greedy incremental regular expression?
by virtualsue (Vicar) on May 14, 2005 at 08:03 UTC
    If you intend to do much with XML, you will probably find the XML modules on the CPAN very useful. Here is a little sample of what XML::Simple can do:
    #!/usr/bin/perl use warnings; use strict; use XML::Simple; use Data::Dumper; my $s = '<foo><color>red</color><color>white</color><color>blue</color +></foo>'; print $s,"\n"; my $parsed = XMLin($s); print Dumper($parsed);
    Voilà:
    <foo><color>red</color><color>white</color><color>blue</color></foo> $VAR1 = { 'color' => [ 'red', 'white', 'blue' ] };
    In english, $parsed is a hash containing one key, 'color', whose value is a reference to an array containing the values 'red', 'white', and 'blue'. E.g.
    print $_,"\n" for ( @{$parsed->{color}} )
Re: non-greedy incremental regular expression?
by gopalr (Priest) on May 14, 2005 at 04:11 UTC

    Use .*? for immediate match

    See the changes below of your code:

    my $s = '<blah>red</blah><blah>white</blah><blah>blue</blah>'; while ($s=~/<blah>(.*?)<\/blah>/) { # this RE isn't right... print "\n$1";sleep 1; }

    Update:: Remove /g

Re: non-greedy incremental regular expression?
by eibwen (Friar) on May 14, 2005 at 04:02 UTC

    You are right to suspect that your regex is greedy: /<blah>(.*)<\/blah>/ matches the entire string, capturing:

    red</blah><blah>white</blah><blah>blue

    Read perlre for more about greedy regexes, but suffice it to say, I think you want /<blah>(.*?)<\/blah>/

Re: non-greedy incremental regular expression?
by prasadbabu (Prior) on May 14, 2005 at 04:17 UTC

    TIMTOWTDI

    while ($s =~ /<blah>((?:(?!<\/blah>).)*)<\/blah>/) { print $1, $/; }

    Prasad

Re: non-greedy incremental regular expression?
by TedPride (Priest) on May 14, 2005 at 15:41 UTC
    my $s = '<blah>red</blah><blah>white</blah><blah>blue</blah>'; print $s, $/; print $1, $/ while $s =~ /<blah>(.*?)<\/blah>/g;