in reply to Re^2: Regex: Capturing and optionally replacing
in thread Regex: Capturing and optionally replacing

Based on the variable name, I thought you were only printing the host without the domain.

And when you said "skip it", I thought you meant the line, not the domain.

while (<DATA>) { my ($host, $domain, $test) = /^([^,.]+)(,[^.]+|)\.(.+)$/; $domain =~ s/,/./g; $domain = '' if /\.net$/; print("Host:$host$domain Test:$test\n"); }

Turns out

$domain =~ s/^.*\.net$//;
takes the same amout of time as
$domain = '' if substr($domain, -4) eq '.net';
but both are slightly slower than
$domain = '' if /\.net$/;

Update: 17% faster:

while (<DATA>) { chomp; my ($host, $test) = split(/\./, $_, 2); $host =~ s/,/./g; $host =~ s/\..*\.net$//; print("Host:$host Test:$test\n"); }

Update: If I change
$host =~ s/\..*\.net$//;
to
$host =~ s/\.my-domain\.net$//;
my version is 6% faster than yours (with $host =~ s/,/./g; added).

Update: Benchmark code

use strict; use warnings; use Benchmark qw( cmpthese ); my @data = <DATA>; sub test_m { my @rv; foreach (@data) { local $_ = $_; # while (<DATA>) my ($host, $test) = / ( # Start first capture [\w\-]+ # One or more alphanum or hyphens (?: # non-capturing lookahead ,my-domain,com # Literal string )? # Make it optional ) # End of first capture (?: # non-capturing lookahead [\w\-,]+ # One or more alpanum or hyphens )? # Make it optional \. # A literal period ( # Start second capture [a-z]+ # One or more lowercase chars ) # End second capture /x or next; $host =~ s/,/./g; push(@rv, "Host:$host Test:$test\n"); } @rv; } sub test_i { my @rv; foreach (@data) { local $_ = $_; # while (<DATA>) chomp; my ($host, $test) = split(/\./, $_, 2); $host =~ s/,/./g; $host =~ s/\.my-domain\.net$//; push(@rv, "Host:$host Test:$test\n"); } @rv; } sub test_i2 { my @rv; foreach (@data) { local $_ = $_; # while (<DATA>) my ($host, $test) = /([^,.]+(?:,my-domain,com)?)[^.]*\.(.+)/x or next; $host =~ s/,/./g; push(@rv, "Host:$host Test:$test\n"); } @rv; } print("m:\n"); print test_m(); print("--\n"); print("i:\n"); print test_i(); print("--\n"); print("i2:\n"); print test_i2(); cmpthese(-2, { m => \&test_m, i => \&test_i, i2 => \&test_i2, }); __DATA__ hosta-sel-kr-1,my-domain,net.testa hostb-sel-kr-1,my-domain,net.testb hostc-sel-kr-1,my-domain,com.testa hostd-sel-kr-1,my-domain,com.testc hoste-sel-kr-1,my-domain,net.testxyz hosta-mel-au-1,my-domain,net.testabc hosta-mel-au-1,my-domain,net.testdef hostxyz.testabc someotherhost.someothertest

Replies are listed 'Best First'.
Re^4: Regex: Capturing and optionally replacing
by McDarren (Abbot) on Dec 08, 2005 at 17:13 UTC
    Great, thanks for that :)

    I guess the lesson I've learned here is to never forget the kiss principle ;)

    Cheers,
    Darren :)

Re^4: Regex: Capturing and optionally replacing
by McDarren (Abbot) on Dec 09, 2005 at 00:34 UTC
    Okay... this one kept me awake last night :(

    Because I'd still like to know... how could I have re-written the original expression to get rid of the unwanted commas?

    Can somebody please put me out of my misery?
    (I promise I won't use it in production :D)

      Substitutions work by editing the string being matched, but you want the substitutions to work on the string being returned. That doesn't work.

      You could do it with a parser, but Perl 5 regexp are not good enough to form a parser. (Well, it's theoretically possible with experimental regexp features.)