comment on

Thanks for you comment.

I did a quick benchmark and it turns out that nested loop solution is the fastest in most cases. In all other cases - I believe it's when regular expressions converge into a nice tree-like structure - both named captures and unnamed captures are about the same.

Here's my code sample:

#!/usr/bin/perl -w

use 5.010;
use strict;
use warnings;
use Test::More tests => 3;

use Benchmark qw(cmpthese);

my @reglist = ( qr/food?/, qr/b[a4]rd?/, qr/baz(?:o+ka)?/, 100..999);
my @lines = (qw(foobarbaz b4rd perl bazooooka football));

my @expect = ("r0", "r1", "", "r2", "r0");

is_deeply(which_reg_loop(\@reglist, \@lines), \@expect, "which_reg_loo
+p")
    and
is_deeply(which_reg_capt(\@reglist, \@lines), \@expect, "which_reg_cap
+t")
    and
is_deeply(which_reg_named(\@reglist, \@lines), \@expect, "which_reg_na
+med")
    or die "Results differ, no bench";

@lines = @lines x 1000;

cmpthese ( -1, {
    loop => sub {
        which_reg_loop(\@reglist, \@lines);
    },
    capt => sub {
        which_reg_capt(\@reglist, \@lines);
    },
    named => sub {
        which_reg_named(\@reglist, \@lines);
    },
});

sub which_reg_loop {
    my ($reglist, $lines) = @_;

    my @ret;
    LINE: foreach my $str (@$lines) {
        for (my $i = 0; $i < @$reglist; $i++) {
            $str =~ $reglist->[$i] or next;
            push @ret, "r$i";
            next LINE;
        };
        push @ret, '';
    };
    return \@ret;
};

sub which_reg_capt {
    my ($reglist, $lines) = @_;

    my $giant = join "|", map { "($_)" } @$reglist;
    $giant = qr($giant);

    my @ret;
    LINE: foreach (@$lines) {
        my @hits = $_ =~ $giant;
        for (my $i = 0; $i < @hits; $i++) {
            $hits[$i] or next;
            push @ret, "r$i";
            next LINE;
        };
        push @ret, '';
    };
    return \@ret;
};

sub which_reg_named {
    my ($reglist, $lines) = @_;
    my $giant = join "|", map { "(?<r$_>$reglist->[$_])" } 0..$#$regli
+st;
    $giant = qr($giant);

    my @ret = map { $_ =~ $giant ? (keys %+) : '' } @$lines;
    return \@ret;
};
[download]

In reply to Re^2: Find out which subpattern matched in regex by Dallaylaen
in thread Find out which subpattern matched in regex by Dallaylaen

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.