in reply to What is the most efficient way to split a long string (see body for details/constraints)?
split is pretty darn fast. But if in doubt, a benchmark confirms this:
#!/usr/bin/env perl use warnings; use strict; use List::Util qw/max/; use Benchmark qw/cmpthese/; use constant WITHTEST => 0; my $cols = 32; my $row = join "\t", map { sprintf("%02d",$_) x 16 } 0..($cols-1); my $data = ( $row . "\n" ) x 100; open my $fh, '<', \$data or die $!; my @wanted = (2,3,12..18,25..28,31); #my @wanted = (2,3,10..15); my $wanted_max = max @wanted; my @wanted2 = (0) x $cols; @wanted2[@wanted] = (1) x @wanted; my ($wanted_re) = map { qr/\A$_\n?\z/ } join '\t', map { $_?'([^\t\n]++)':'[^\t\n]++' } @wanted2; my $expect = join "\t", map { sprintf("%02d",$_) x 16 } @wanted; cmpthese(-2, { split => sub { seek $fh, 0, 0 or die; while (<$fh>) { chomp; my @sel = (split /\t/, $_, $cols)[@wanted]; if (WITHTEST) { die "@sel\n$expect\n" unless join("\t",@sel) eq $expect } } }, scan => sub { seek $fh, 0, 0 or die; while (<$fh>) { chomp; my ($pos,$i,$prevpos,@sel)=(0,0); while ( $pos>=0 && $i<=$wanted_max ) { $prevpos = $pos; $pos = index($_, "\t", $pos+1); push @sel, substr($_, $prevpos+1, ($pos<0 ? length : $pos)-$prevpos-1 ) if $wanted2[$i++]; } if (WITHTEST) { die "@sel\n$expect\n" unless join("\t",@sel) eq $expect } } }, regex => sub { seek $fh, 0, 0 or die; while (<$fh>) { my @sel = /$wanted_re/ or die $_; if (WITHTEST) { die "@sel\n$expect\n" unless join("\t",@sel) eq $expect } } }, fh => sub { seek $fh, 0, 0 or die; while ( my $line = <$fh> ) { chomp($line); open my $fh2, '<', \$line or die $!; local $/ = "\t"; my @sel; for my $i (0..$wanted_max) { my $d = <$fh2>; next unless $wanted2[$i]; chomp $d; push @sel, $d; } close $fh2; if (WITHTEST) { die "@sel\n$expect\n" unless join("\t",@sel) eq $expect } } }, }); __END__ Rate regex fh scan split regex 1456/s -- -11% -13% -68% fh 1643/s 13% -- -1% -64% scan 1665/s 14% 1% -- -64% split 4586/s 215% 179% 175% --
Remember that split is implemented internally in C, while the above scan is implemented in Perl. You could probably gain more speed than split if you implemented something like this in C, but then again, whether that's worth the effort depends on how much more speed you need. Update: For the sake of completeness: I'm not surprised the regex solution is slower, regexes are very powerful but working with fixed strings often outperforms them, and I added the fh solution because it was the fastest in the thread I linked to above.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: What is the most efficient way to split a long string (see body for details/constraints)?
by mikegold10 (Acolyte) on Jun 21, 2019 at 18:55 UTC | |
by haukex (Archbishop) on Jun 21, 2019 at 22:31 UTC | |
|
Re^2: What is the most efficient way to split a long string (see body for details/constraints)?
by Anonymous Monk on Jun 21, 2019 at 18:38 UTC | |
by Anonymous Monk on Jun 21, 2019 at 18:46 UTC | |
by mikegold10 (Acolyte) on Jun 21, 2019 at 18:47 UTC |