comment on

split is pretty darn fast. But if in doubt, a benchmark confirms this:

#!/usr/bin/env perl
use warnings;
use strict;
use List::Util qw/max/;
use Benchmark qw/cmpthese/;
use constant WITHTEST => 0;

my $cols = 32;
my $row = join "\t", map { sprintf("%02d",$_) x 16 } 0..($cols-1);
my $data = ( $row . "\n" ) x 100;
open my $fh, '<', \$data or die $!;

my @wanted = (2,3,12..18,25..28,31);
#my @wanted = (2,3,10..15);
my $wanted_max = max @wanted;
my @wanted2 = (0) x $cols;
@wanted2[@wanted] = (1) x @wanted;
my ($wanted_re) = map { qr/\A$_\n?\z/ } join '\t',
    map { $_?'([^\t\n]++)':'[^\t\n]++' } @wanted2;
my $expect = join "\t", map { sprintf("%02d",$_) x 16 } @wanted;

cmpthese(-2, {
    split => sub {
        seek $fh, 0, 0 or die;
        while (<$fh>) {
            chomp;
            my @sel = (split /\t/, $_, $cols)[@wanted];
            if (WITHTEST) { die "@sel\n$expect\n"
                unless join("\t",@sel) eq $expect }
        }
    },
    scan => sub {
        seek $fh, 0, 0 or die;
        while (<$fh>) {
            chomp;
            my ($pos,$i,$prevpos,@sel)=(0,0);
            while ( $pos>=0 && $i<=$wanted_max ) {
                $prevpos = $pos;
                $pos = index($_, "\t", $pos+1);
                push @sel, substr($_, $prevpos+1,
                    ($pos<0 ? length : $pos)-$prevpos-1 )
                        if $wanted2[$i++];
            }
            if (WITHTEST) { die "@sel\n$expect\n"
                unless join("\t",@sel) eq $expect }
        }
    },
    regex => sub {
        seek $fh, 0, 0 or die;
        while (<$fh>) {
            my @sel = /$wanted_re/ or die $_;
            if (WITHTEST) { die "@sel\n$expect\n"
                unless join("\t",@sel) eq $expect }
        }
    },
    fh => sub {
        seek $fh, 0, 0 or die;
        while ( my $line = <$fh> ) {
            chomp($line);
            open my $fh2, '<', \$line or die $!;
            local $/ = "\t";
            my @sel;
            for my $i (0..$wanted_max) {
                my $d = <$fh2>;
                next unless $wanted2[$i];
                chomp $d;
                push @sel, $d;
            }
            close $fh2;
            if (WITHTEST) { die "@sel\n$expect\n"
                unless join("\t",@sel) eq $expect }
        }
    },
});

__END__

        Rate regex    fh  scan split
regex 1456/s    --  -11%  -13%  -68%
fh    1643/s   13%    --   -1%  -64%
scan  1665/s   14%    1%    --  -64%
split 4586/s  215%  179%  175%    --
[download]

Remember that split is implemented internally in C, while the above scan is implemented in Perl. You could probably gain more speed than split if you implemented something like this in C, but then again, whether that's worth the effort depends on how much more speed you need. Update: For the sake of completeness: I'm not surprised the regex solution is slower, regexes are very powerful but working with fixed strings often outperforms them, and I added the fh solution because it was the fastest in the thread I linked to above.

In reply to Re: What is the most efficient way to split a long string (see body for details/constraints)? by haukex
in thread What is the most efficient way to split a long string (see body for details/constraints)? by mikegold10

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.