split is pretty darn fast. But if in doubt, a benchmark confirms this:

#!/usr/bin/env perl use warnings; use strict; use List::Util qw/max/; use Benchmark qw/cmpthese/; use constant WITHTEST => 0; my $cols = 32; my $row = join "\t", map { sprintf("%02d",$_) x 16 } 0..($cols-1); my $data = ( $row . "\n" ) x 100; open my $fh, '<', \$data or die $!; my @wanted = (2,3,12..18,25..28,31); #my @wanted = (2,3,10..15); my $wanted_max = max @wanted; my @wanted2 = (0) x $cols; @wanted2[@wanted] = (1) x @wanted; my ($wanted_re) = map { qr/\A$_\n?\z/ } join '\t', map { $_?'([^\t\n]++)':'[^\t\n]++' } @wanted2; my $expect = join "\t", map { sprintf("%02d",$_) x 16 } @wanted; cmpthese(-2, { split => sub { seek $fh, 0, 0 or die; while (<$fh>) { chomp; my @sel = (split /\t/, $_, $cols)[@wanted]; if (WITHTEST) { die "@sel\n$expect\n" unless join("\t",@sel) eq $expect } } }, scan => sub { seek $fh, 0, 0 or die; while (<$fh>) { chomp; my ($pos,$i,$prevpos,@sel)=(0,0); while ( $pos>=0 && $i<=$wanted_max ) { $prevpos = $pos; $pos = index($_, "\t", $pos+1); push @sel, substr($_, $prevpos+1, ($pos<0 ? length : $pos)-$prevpos-1 ) if $wanted2[$i++]; } if (WITHTEST) { die "@sel\n$expect\n" unless join("\t",@sel) eq $expect } } }, regex => sub { seek $fh, 0, 0 or die; while (<$fh>) { my @sel = /$wanted_re/ or die $_; if (WITHTEST) { die "@sel\n$expect\n" unless join("\t",@sel) eq $expect } } }, fh => sub { seek $fh, 0, 0 or die; while ( my $line = <$fh> ) { chomp($line); open my $fh2, '<', \$line or die $!; local $/ = "\t"; my @sel; for my $i (0..$wanted_max) { my $d = <$fh2>; next unless $wanted2[$i]; chomp $d; push @sel, $d; } close $fh2; if (WITHTEST) { die "@sel\n$expect\n" unless join("\t",@sel) eq $expect } } }, }); __END__ Rate regex fh scan split regex 1456/s -- -11% -13% -68% fh 1643/s 13% -- -1% -64% scan 1665/s 14% 1% -- -64% split 4586/s 215% 179% 175% --

Remember that split is implemented internally in C, while the above scan is implemented in Perl. You could probably gain more speed than split if you implemented something like this in C, but then again, whether that's worth the effort depends on how much more speed you need. Update: For the sake of completeness: I'm not surprised the regex solution is slower, regexes are very powerful but working with fixed strings often outperforms them, and I added the fh solution because it was the fastest in the thread I linked to above.


In reply to Re: What is the most efficient way to split a long string (see body for details/constraints)? by haukex
in thread What is the most efficient way to split a long string (see body for details/constraints)? by mikegold10

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.