comment on

I am trying to parse inter-language links in Wikipedia. I need to know to what languages a given page has links. Let's say that $string is the page's text:

$string = "bla bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla
+ bla";
[download]

And i need the list:

qw(en de ga);
[download]

I can do this:

@matches = ( $string = m/\[\[(en|de|ga):.+?]\]/g );
[download]

And then i get qw(en de ga) in @matches, but that's because i have only one pair of capturing brackets, which is a limitation. If i do, for example:

@matches = ( $string = m/\[\[(en|de|ga):(.+?)]\]/g );
[download]

Then i'll get qw(en English de German ga Irish). Is there a clever way to get a list of all the results from one pair of capturing brackets? I tried using Perl 5.10's named captures and %-. Either it can't be done this way or i am doing incorrectly. I tried this:

my $string = 'bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla'
+;

if (my @matches = ($string =~ m/\[\[(?<lang>en|de|ga):(.+?)\]\]/g)) {
    say 'matches';
    say 'matches: ', Dumper(\@matches);
    say 'minus  : ', Dumper(\%-);
    say 'plus   : ', Dumper(\%+);
}
[download]

i get this output:

matches
matches: $VAR1 = [
          'en',
          'English',
          'de',
          'German',
          'ga',
          'Irish'
        ];

minus  : $VAR1 = {
          'lang' => [
                      'ga'
                    ]
        };

plus   : $VAR1 = {
          'lang' => 'ga'
        };
[download]

You see - only ('ga'), but is there some way to get:

$VAR1 = {
          'lang' => [
                      'en',
                      'de',
                      'ga'
                    ]
        };
[download]

Thanks in advance for any help.

In reply to Getting a list of captures in Perl 5.10 by amir_e_a

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.