comment on

Depending on the complexity of match_names (and the size of the arrays being tested) you could also use a regular expression to find the matches.

But for clarity you are probably best off with the nested loops and hash approach offered by rodion

sub test_regex {
    my @array1 = ( "test", "test2", "test2", "test3", "test4", "test4"
+ );
    my @array2 = ( "test", "test", "test2.1", "test4.1", "test4.2", "t
+est4.3" );

    # Create a list of the items in array 1 separated by |
    # The reverse sort is used to keep the longer names to the front o
+f
    # the list, in order to avoid matching 'test' when we could match 
+'test4'
    my $match_array1 = join('|',reverse sort @array1);

    # Using the 'x' option with regular expressions helps readability,
+ 
    # even in simple cases like this
    my $regex        = qr{
                         ^
                         ($match_array1)
                       }x;


    my %found_one = ();
    foreach my $item2 (@array2) {
        next unless defined $item2;
        if ($item2 =~ /$regex/) {
            $found_one{$1}++;
        }
    }
    return keys %found_one;
}
[download]

use strict;
use warnings;
use Benchmark qw(cmpthese);
if (0) {
    print "regex: $_\n" for test_regex();
    print "loop:  $_\n" for test_loop();
    exit;
}
cmpthese(1000,{
    loop => \&test_loop,
    regex => \&test_regex,
});


sub test_regex {
    my @array1 = ( "test", "test2", "test2", "test3", "test4", "test4"
+ );
    my @array2 = ( "test", "test", "test2.1", "test4.1", "test4.2", "t
+est4.3" );

    # Create a list of the items in array 1 separated by |
    # The reverse sort is used to keep the longer names to the front o
+f
    # the list, in order to avoid matching 'test' when we could match 
+'test4'
    my $match_array1 = join('|',reverse sort @array1);

    # Using the 'x' option with regular expressions helps readability,
+ 
    # even in simple cases like this
    my $regex        = qr{
                         ^
                         ($match_array1)
                       }x;


    my %found_one = ();
    foreach my $item2 (@array2) {
        next unless defined $item2;
        if ($item2 =~ /$regex/) {
            $found_one{$1}++;
        }   
    }   
    return keys %found_one;
}
sub test_loop {
    my @array1 = ( "test", "test2", "test2", "test3", "test4", "test4"
+ );
    my @array2 = ( "test", "test", "test2.1", "test4.1", "test4.2", "t
+est4.3" );
    my %found_one = ();

    foreach my $item1 (@array1) {
        foreach my $item2 (@array2) {
            next unless defined $item2;
            if (match_names($item1, $item2)) {
                $found_one{$item1}++;
                last;
            }
        }
    }
    return keys %found_one;

}

sub match_names {
  my ($x,$y) = @_;
  return 1 if ($y =~ /$x/);
  return;
}
[download]

The results on my machine are:

        Rate  loop regex
loop  1136/s    --  -60%
regex 2857/s  151%    --
[download]

In reply to Re: multiple matching in arrays by imp
in thread multiple matching in arrays by rsiedl

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.