in reply to replace separator from array elements
RE-EDIT: grumble grumble markup grumble
Captures and the /g flag will help here. (Rewritten slightly to allow me to test the code easily.)
The regex is /x'ed for tutorial purposes; you'd almost certainly write it as /(\d+)/g in your program. Since we're using \d+, we'll match the longest possible string of digits in each case. Alternatively, we could use split():#!/opt/local/bin/perl use strict; use warnings; open my $vlan_in, '<', shift @ARGV; my @vlans = <$vlan_in>; foreach my $line (@vlans) { chomp $line; my @items = ($line =~ /( # capture: \d # digit characters + # one or more in a row )/gx ); # as many as you can find foreach my $item (@items) { next unless defined $item; $item =~ s/,//; print "vlan $item\n"; } }
Timing this:#!/opt/local/bin/perl use strict; use warnings; open my $vlan_in, '<', shift @ARGV; my @vlans = <$vlan_in>; foreach my $line (@vlans) { chomp $line; # Separate the 'vlan' from the list (and throw it away). my (undef, $vlans) = split /\s+/, $line; # Break up the list into items. my @items = split /,/, $vlans; # Print your new output. foreach my $item (@items) { next unless defined $item; $item =~ s/,//; print "vlan $item\n"; } }
shows that substitute then split is fastest. 100 thousand iterations:use Benchmark qw(:all); my @lines = split /\n/, <<EOF; vlan 107 vlan 121 vlan 122,127,129,137 vlan EOF pop @lines; cmpthese( 500_000, { 'split-split' => sub { my @copy = @lines; foreach my $line (@copy) { my (undef, $vlans) = split /\s+/, $line; my @items = split /,/, $vlans; } }, '/g' => sub { my @copy = @lines; foreach my $line (@copy) { my @items = ($line =~ /(\d+)/g); } }, 'sub-split' => sub { my @copy = @lines; foreach my $line (@copy) { s/vlan //; my @items = split /,/, $line; } }, } );
500 thousand:Rate split-split /g sub-split split-split 104167/s -- -8% -22% /g 113636/s 9% -- -15% sub-split 133333/s 28% 17% --
1 million:Rate split-split /g sub-split split-split 102459/s -- -11% -26% /g 115741/s 13% -- -16% sub-split 138504/s 35% 20% --
5 million:Rate split-split /g sub-split split-split 100503/s -- -12% -28% /g 114286/s 14% -- -18% sub-split 138889/s 38% 22% --
And that's all I feel like running. Obviously sub-split keeps getting better as the number of iterations increase; I think that's because the sub allows the string to be "shrunk" in-place without allocating any more memory. However, it starts falling off again at 5 million iterations; someone with more time than me might want to investigate more iterations.Rate split-split /g sub-split split-split 102480/s -- -10% -24% /g 114495/s 12% -- -15% sub-split 134590/s 31% 18% --
Basically, substitute out the stuff you don't need then split is fastest.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: replace separator from array elements
by nidhi (Acolyte) on Sep 12, 2007 at 23:50 UTC |