EDIT: added the third possibility to the benchmark (null out the 'vlan ' and then split)

RE-EDIT: grumble grumble markup grumble

Captures and the /g flag will help here. (Rewritten slightly to allow me to test the code easily.)

#!/opt/local/bin/perl use strict; use warnings; open my $vlan_in, '<', shift @ARGV; my @vlans = <$vlan_in>; foreach my $line (@vlans) { chomp $line; my @items = ($line =~ /( # capture: \d # digit characters + # one or more in a row )/gx ); # as many as you can find foreach my $item (@items) { next unless defined $item; $item =~ s/,//; print "vlan $item\n"; } }
The regex is /x'ed for tutorial purposes; you'd almost certainly write it as /(\d+)/g in your program. Since we're using \d+, we'll match the longest possible string of digits in each case. Alternatively, we could use split():
#!/opt/local/bin/perl use strict; use warnings; open my $vlan_in, '<', shift @ARGV; my @vlans = <$vlan_in>; foreach my $line (@vlans) { chomp $line; # Separate the 'vlan' from the list (and throw it away). my (undef, $vlans) = split /\s+/, $line; # Break up the list into items. my @items = split /,/, $vlans; # Print your new output. foreach my $item (@items) { next unless defined $item; $item =~ s/,//; print "vlan $item\n"; } }
Timing this:
use Benchmark qw(:all); my @lines = split /\n/, <<EOF; vlan 107 vlan 121 vlan 122,127,129,137 vlan EOF pop @lines; cmpthese( 500_000, { 'split-split' => sub { my @copy = @lines; foreach my $line (@copy) { my (undef, $vlans) = split /\s+/, $line; my @items = split /,/, $vlans; } }, '/g' => sub { my @copy = @lines; foreach my $line (@copy) { my @items = ($line =~ /(\d+)/g); } }, 'sub-split' => sub { my @copy = @lines; foreach my $line (@copy) { s/vlan //; my @items = split /,/, $line; } }, } );
shows that substitute then split is fastest. 100 thousand iterations:
Rate split-split /g sub-split split-split 104167/s -- -8% -22% /g 113636/s 9% -- -15% sub-split 133333/s 28% 17% --
500 thousand:
Rate split-split /g sub-split split-split 102459/s -- -11% -26% /g 115741/s 13% -- -16% sub-split 138504/s 35% 20% --
1 million:
Rate split-split /g sub-split split-split 100503/s -- -12% -28% /g 114286/s 14% -- -18% sub-split 138889/s 38% 22% --
5 million:
Rate split-split /g sub-split split-split 102480/s -- -10% -24% /g 114495/s 12% -- -15% sub-split 134590/s 31% 18% --
And that's all I feel like running. Obviously sub-split keeps getting better as the number of iterations increase; I think that's because the sub allows the string to be "shrunk" in-place without allocating any more memory. However, it starts falling off again at 5 million iterations; someone with more time than me might want to investigate more iterations.

Basically, substitute out the stuff you don't need then split is fastest.


In reply to Re: replace separator from array elements by pemungkah
in thread replace separator from array elements by nidhi

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.