I am trying to parse inter-language links in Wikipedia. I need to know to what languages a given page has links. Let's say that $string is the page's text:
$string = "bla bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla + bla";
And i need the list:
qw(en de ga);
I can do this:
@matches = ( $string = m/\[\[(en|de|ga):.+?]\]/g );
And then i get qw(en de ga) in @matches, but that's because i have only one pair of capturing brackets, which is a limitation. If i do, for example:
@matches = ( $string = m/\[\[(en|de|ga):(.+?)]\]/g );
Then i'll get qw(en English de German ga Irish). Is there a clever way to get a list of all the results from one pair of capturing brackets? I tried using Perl 5.10's named captures and %-. Either it can't be done this way or i am doing incorrectly. I tried this:
my $string = 'bla bla [[en:English]][[de:German]][[ga:Irish]] bla bla' +; if (my @matches = ($string =~ m/\[\[(?<lang>en|de|ga):(.+?)\]\]/g)) { say 'matches'; say 'matches: ', Dumper(\@matches); say 'minus : ', Dumper(\%-); say 'plus : ', Dumper(\%+); }
i get this output:
matches matches: $VAR1 = [ 'en', 'English', 'de', 'German', 'ga', 'Irish' ]; minus : $VAR1 = { 'lang' => [ 'ga' ] }; plus : $VAR1 = { 'lang' => 'ga' };
You see - only ('ga'), but is there some way to get:
$VAR1 = { 'lang' => [ 'en', 'de', 'ga' ] };
Thanks in advance for any help.

In reply to Getting a list of captures in Perl 5.10 by amir_e_a

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.