EDIT: added the third possibility to the benchmark (null out the 'vlan ' and then split)
RE-EDIT: grumble grumble markup grumble
Captures and the /g flag will help here. (Rewritten slightly to allow me to test the code easily.)
#!/opt/local/bin/perl
use strict;
use warnings;
open my $vlan_in, '<', shift @ARGV;
my @vlans = <$vlan_in>;
foreach my $line (@vlans) {
chomp $line;
my @items = ($line =~ /( # capture:
\d # digit characters
+ # one or more in a row
)/gx ); # as many as you can find
foreach my $item (@items) {
next unless defined $item;
$item =~ s/,//;
print "vlan $item\n";
}
}
The regex is /x'ed for tutorial purposes; you'd almost certainly write it as /(\d+)/g in your program. Since we're using \d+, we'll match the longest possible string of digits in each case.
Alternatively, we could use split():
#!/opt/local/bin/perl
use strict;
use warnings;
open my $vlan_in, '<', shift @ARGV;
my @vlans = <$vlan_in>;
foreach my $line (@vlans) {
chomp $line;
# Separate the 'vlan' from the list (and throw it away).
my (undef, $vlans) = split /\s+/, $line;
# Break up the list into items.
my @items = split /,/, $vlans;
# Print your new output.
foreach my $item (@items) {
next unless defined $item;
$item =~ s/,//;
print "vlan $item\n";
}
}
Timing this:
use Benchmark qw(:all);
my @lines = split /\n/, <<EOF;
vlan 107
vlan 121
vlan 122,127,129,137
vlan
EOF
pop @lines;
cmpthese(
500_000,
{ 'split-split' => sub {
my @copy = @lines;
foreach my $line (@copy) {
my (undef, $vlans) = split /\s+/, $line;
my @items = split /,/, $vlans;
}
},
'/g' => sub {
my @copy = @lines;
foreach my $line (@copy) {
my @items = ($line =~ /(\d+)/g);
}
},
'sub-split' => sub {
my @copy = @lines;
foreach my $line (@copy) {
s/vlan //;
my @items = split /,/, $line;
}
},
}
);
shows that substitute then split is fastest.
100 thousand iterations:
Rate split-split /g sub-split
split-split 104167/s -- -8% -22%
/g 113636/s 9% -- -15%
sub-split 133333/s 28% 17% --
500 thousand:
Rate split-split /g sub-split
split-split 102459/s -- -11% -26%
/g 115741/s 13% -- -16%
sub-split 138504/s 35% 20% --
1 million:
Rate split-split /g sub-split
split-split 100503/s -- -12% -28%
/g 114286/s 14% -- -18%
sub-split 138889/s 38% 22% --
5 million:
Rate split-split /g sub-split
split-split 102480/s -- -10% -24%
/g 114495/s 12% -- -15%
sub-split 134590/s 31% 18% --
And that's all I feel like running. Obviously sub-split keeps getting better as the number of iterations increase; I think that's because the sub allows the string to be "shrunk" in-place without allocating any more memory. However, it starts falling off again at 5 million iterations; someone with more time than me might want to investigate more iterations.
Basically, substitute out the stuff you don't need then split is fastest. |