comment on

Great question Chris!

This sounds like a good candidate for profiling. I myself am fairly new to profiling... and found the following with Devel::SmallProf. I populated @array with 1400 elements, I don't know if that's enough to get meaningful results. However I did see a difference below...

If you're able to, run your real data through one of the profiling modules. Using real data yeilds more meaningful results when you do profiling. Then you'll have a better handle on what is slowing things down, and what might be possible to write differently.

First hash of your code to get a working version...

use strict;
use warnings;

my @array   = ("x y.g z 123", "a b.f c 456","a b.f c 456" ); #added lo
+ts more on the runs through smallprof.
my @include = ("b","q");
my @keep;


my %testHash = map { $_,1} @include;

for my $tmp (@array){
        my @tokens = split /\s+/,$tmp;
         my ($test,$rubbish) = split /\./, $tokens[1],2;
        if (exists $testHash{$test}) {
            push (@keep, $tmp);
        };
}

print keys %testHash , "\n";
print "@keep";
[download]

Results

           ================ SmallProf version 2.02 ================
                              Profile of main.pl                      
+ Page 1
       ===============================================================
+==
    count wall tm  cpu time line
        0   0.00000   0.00000     1:use strict;
        0   0.00000   0.00000     2:use warnings;
        0   0.00000   0.00000     3:
        1   0.00091   0.00000     4:my @array   = ("x y.g z 123", "a b
+.f c 456",
        1   0.00000   0.00000     5:my @include = ("b","q");
        1   0.00000   0.00000     6:my @keep;
        0   0.00000   0.00000     7:
        0   0.00000   0.00000     8:
        1   0.00001   0.00000     9:my %testHash = map { $_,1} @includ
+e;
        0   0.00000   0.00000    10:
        1   0.00000   0.00000    11:for my $tmp (@array){
     1404   0.00507   0.07800    12:        my @tokens = split /\s+/,$
+tmp;
     1404   0.00034   0.03200    13:      my ($test,$rubbish) = split 
+/\./,
     1404   0.00282   0.01500    14:        if (exists $testHash{$test
+}) {
        0   0.00000   0.00000    15:            push (@keep, $tmp);
        0   0.00000   0.00000    16:        };
        0   0.00000   0.00000    17:}
        0   0.00000   0.00000    18:
        1   0.00013   0.00000    19:print keys %testHash , "\n";
        1   0.10416   0.00000    20:print "@keep";
[download]

From this it appears that split is responsible for the majority of time required for this loop. I also found out that split has an optional 4th argument LIMIT which is a positive integer starting from 1 for the number of fields to return. I suspect split was/is uslessly returning the extra characters and simply throughing them into the void. This however is only beneficial if you're dealing with millions of elements and not 1400 as per this profile run. Otherwise it'll save barely a couple of milliseconds...

Modified version with my ($test,$rubbish) = split /\./, $tokens[1],2; having it's LIMIT set to 2.

           ================ SmallProf version 2.02 ================
                              Profile of main.pl                      
+ Page 1
       ===============================================================
+==
    count wall tm  cpu time line
        0   0.00000   0.00000     1:use strict;
        0   0.00000   0.00000     2:use warnings;
        0   0.00000   0.00000     3:
        1   0.00092   0.00000     4:my @array   = ("x y.g z 123", "a b
+.f c 456",
        1   0.00000   0.00000     5:my @include = ("b","q");
        1   0.00000   0.00000     6:my @keep;
        0   0.00000   0.00000     7:
        0   0.00000   0.00000     8:
        1   0.00001   0.00000     9:my %testHash = map { $_,1} @includ
+e;
        0   0.00000   0.00000    10:
        1   0.00000   0.00000    11:for my $tmp (@array){
     1404   0.00347   0.06300    12:        my @tokens = split /\s+/,$
+tmp;
     1404   0.00007   0.03000    13:      my ($test,$rubbish) = split 
+/\./,
     1404   0.00046   0.00000    14:        if (exists $testHash{$test
+}) {
        0   0.00000   0.00000    15:            push (@keep, $tmp);
        0   0.00000   0.00000    16:        };
        0   0.00000   0.00000    17:}
        0   0.00000   0.00000    18:
        1   0.00019   0.00000    19:print keys %testHash , "\n";
        1   0.11720   0.00000    20:print "@keep";
[download]

In reply to Re: filter an array with consecutive splits by desemondo
in thread filter an array with consecutive splits by coldy

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.