Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Getting a list of captures in Perl 5.10

by amir_e_a (Hermit)
on May 24, 2008 at 14:47 UTC ( #688303=perlquestion: print w/replies, xml ) Need Help??

amir_e_a has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse inter-language links in Wikipedia. I need to know to what languages a given page has links. Let's say that $string is the page's text:
$string = "bla bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla + bla";
And i need the list:
qw(en de ga);
I can do this:
@matches = ( $string = m/\[\[(en|de|ga):.+?]\]/g );
And then i get qw(en de ga) in @matches, but that's because i have only one pair of capturing brackets, which is a limitation. If i do, for example:
@matches = ( $string = m/\[\[(en|de|ga):(.+?)]\]/g );
Then i'll get qw(en English de German ga Irish). Is there a clever way to get a list of all the results from one pair of capturing brackets? I tried using Perl 5.10's named captures and %-. Either it can't be done this way or i am doing incorrectly. I tried this:
my $string = 'bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla' +; if (my @matches = ($string =~ m/\[\[(?<lang>en|de|ga):(.+?)\]\]/g)) { say 'matches'; say 'matches: ', Dumper(\@matches); say 'minus : ', Dumper(\%-); say 'plus : ', Dumper(\%+); }
i get this output:
matches matches: $VAR1 = [ 'en', 'English', 'de', 'German', 'ga', 'Irish' ]; minus : $VAR1 = { 'lang' => [ 'ga' ] }; plus : $VAR1 = { 'lang' => 'ga' };
You see - only ('ga'), but is there some way to get:
$VAR1 = { 'lang' => [ 'en', 'de', 'ga' ] };
Thanks in advance for any help.

Replies are listed 'Best First'.
Re: Getting a list of captures in Perl 5.10
by BrowserUk (Patriarch) on May 24, 2008 at 15:52 UTC

    Without using any of the 5.10 features you can do:

    $string = 'bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla'; %matches = $string =~ m/\[ \[ ( en|de|ga ) : (.+?) \] \]/gx; $var = { lang => [ keys %matches ] }; pp $var; { lang => ["en", "ga", "de"] }

    But that makes me wonder why you are capturing the longnames just to throw them away?

    If you remove the capture for those you can do:

    my $string = 'bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla' +; my $var = { lang => [ $string =~ m/\[ \[ ( en|de|ga ) : .+? \] \]/gx ] + }; pp $var; { lang => ["en", "de", "ga"] }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      pp $var;
      Somewhat off-topic, but what is this pp?
      Thanks for the replies. The example with the hash only works with two sets of capturing parens. It's a nice hack, but i was to hoping to find something generic which will work with any number of captures.
Re: Getting a list of captures in Perl 5.10
by moritz (Cardinal) on May 24, 2008 at 15:17 UTC
    I don't think you can get exactly what you want out of 5.10's regex engine. (In Perl 6 you can get it, and PGE already implements that btw).

    Apart from the stuff you tried there is also another hack which uses the experimental (?{...}) code assertions:

    our @matches; m/(?>(something)(?{ push @matches, $^N }))/;
Re: Getting a list of captures in Perl 5.10
by duelafn (Parson) on May 25, 2008 at 12:49 UTC


    use Data::Dumper; my $string = 'bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla' +; my $groups = 2; my @tmp = ($string =~ m/\[\[(en|de|ga):(.+?)\]\]/g); my @matches; push @matches, [splice @tmp, 0, $groups] while @tmp; print Dumper \@matches; $groups = 3; @tmp = ($string =~ m/\[\[(en|de|ga):(.)(.+?)\]\]/g); @matches = (); push @matches, [splice @tmp, 0, $groups] while @tmp; print Dumper \@matches;

    Good Day,

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://688303]
Approved by Corion
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2023-04-02 09:13 GMT
Find Nodes?
    Voting Booth?

    No recent polls found