blackadder has asked for the wisdom of the Perl Monks concerning the following question:

Hi Folks


Using pattern matching What is the quickest (and best) way to obtain the server name only from a UNC path.

This what I have done but I think there is a better way;
$unc = '\\server_name\sys_share'; $unc =~ s/^\W*\w+//; $server = $&; $server =~ s/^\W+//;
Many Thanks

Replies are listed 'Best First'.
Re: Obtaining server name from UNC path
by Abigail-II (Bishop) on Aug 08, 2002 at 09:41 UTC
    ($server) = $unc =~ /^\\([^\\]+)/;
    Abigail
Re: Obtaining server name from UNC path
by theorbtwo (Prior) on Aug 08, 2002 at 09:52 UTC

    BIG FAT WARNING: Read Unfortunately benchmarking things with $& isn't easy... to learn why this code is wrong, and the numbers it gives shouldn't be trusted

    m// and capturing parentheses are your friend. Also, your code won't match all server names: if nothing else, spaces and digits are valid in SMB machinenames. And when the question is of speed, Benchmark is the answer. (My answer is below, under "theorbtwo=>".)

    #!perl use warnings; use Benchmark; $unc = '\\server_name\sys_share'; $unc =~ s/^\W*\w+//; $server = $&; $server =~ s/^\W+//; Benchmark::timethese(100000, { blackadder => sub { $lunc = $unc; $lunc =~ s/^\W*\w+//; $server = $&; $server =~ s/^\W+//; }, theorbtwo => sub { $unc =~ m/^\\\\([^\\]+)\\/; $server=$1; } });


    Confession: It does an Immortal Body good.

      update One of the reasons I wrote this reply was to illustrate the issues writing a good benchmark, specifically relating to bencharking things with and without $& in them. But it seems that I made a number of mistakes myself. Make sure you read Abigail-IIs comments below after you read this post, some of what I say turns out to be wrong.


      While I agree that benchmarking your solution against blackadders is an interesting thought I have to point out that unfortunately the way you have done it will produce results that are both incorrect (because your benchmark doesnt do what you think it does) and misleading because you arent testing fairly (that dagnatted $& has bitten you in debugger :-).

      So lets look at the problems with your benchamrking code:

      # THIS doesnt mean \\server_name\sys_share it means \server_name\sys_s +hare $unc = '\\server_name\sys_share'; # and then you remove the leading "\" and trailing "\sys_share" before + the benchmark even starts! $unc =~ s/^\W*\w+//; $server = $&; $server =~ s/^\W+//; # None of the regexes in the benchmark will match anymore (in a meanin +gful way)
      So to do the benchmark properly I modified your code
      #!perl use warnings; use Benchmark; $unc = '\\\\server_name\\sys_share'; my $re = Benchmark::timethese(-5, { blackadder => sub { $lunc = $unc; $lunc =~ s/^\W*\w+//; $server = $&; $server =~ s/^\W+//; }, theorbtwo => sub { $unc =~ m/^\\\\([^\\]+)\\/; $server = $1; } } ); Benchmark::cmpthese($re); __END__ Benchmark: running blackadder, theorbtwo, each for at least 5 CPU seco +nds... blackadder: 6 wallclock secs ( 5.12 usr + 0.00 sys = 5.12 CPU) @ 13 +1371.90/s (n=673281) theorbtwo: 5 wallclock secs ( 5.34 usr + 0.00 sys = 5.34 CPU) @ 19 +0423.65/s (n=1017624) Rate blackadder theorbtwo blackadder 131372/s -- -31% theorbtwo 190424/s 45% --
      Which shows that your method is faster than blackadders but not too much, only 45%. Luckily for you this number is _still_ totally wrong. The reason is because $& has an interesting effect on regexes _ANYWHERE_ in a program that uses $&, namely it slows them down massively. (japhy has written a number of articles about this, and some approachs to resolve the problem.) So in order to benchmark a solution that use $& against one that doesnt we will need to benchmark them in different perl processes (not forked! totally different), like so:

      Program 1: bm_blackadder.pl

      # benchmark saw ampersand -- BlackAdder use strict; use warnings; use Benchmark qw(timethis); use Data::Dumper; my $count=$ARGV[0] || -1; my $unc =$ARGV[1] || '\\\\server_name\\sys_share'; print "Matching $unc for $count\n"; print Dumper(timethis($count,sub { my $lunc = $unc; $lunc =~ s/^\W*\w+//; (my $server = $&)=~ s/^\W+//; $server },'blackadder' ));
      Program 2: bm_theorbtwo.pm
      # benchmark saw ampersand -- theorbtwo use strict; use warnings; use Benchmark qw(timethis); use Data::Dumper; my $count=$ARGV[0] || -1; my $unc =$ARGV[1] || '\\\\server_name\\sys_share'; print "Matching $unc for $count\n"; print Dumper(timethis($count,sub { $unc =~ m/^\\\\([^\\]+)\\/; $1; }, 'theorbtwo' ));
      Program 3: run_bm.pl
      (Run the others and return the results of both, compared together.)
      use Benchmark 'cmpthese'; use Data::Dumper; sub run_bm($){ my $str=shift; my $h; $str=~s/\A(.*)\$VAR1 =/$h=$1;''/se; print $h; my $v=eval($str); die $@ if $@; $v } my $opts='-5 \\\\foo\\bar\\baz.exe; my $hash={ blackadder => run_bm(`perl bm_blackadder.pl $opts`), theorbtwo => run_bm(`perl bm_theorbtwo.pl $opts`), }; cmpthese($hash); __END__
      Which when set up correclty run_bm.pl outputs
      Matching \\foo\bar\baz.exe for -5 blackadder: 6 wallclock secs ( 5.22 usr + 0.00 sys = 5.22 CPU) @ 13 +5204.60/s (n=705768) Matching \\foo\bar\baz.exe for -5 theorbtwo: 6 wallclock secs ( 5.17 usr + 0.00 sys = 5.17 CPU) @ 36 +7160.48/s (n=1898954) Rate blackadder theorbtwo blackadder 135205/s -- -63% theorbtwo 367160/s 172% --
      Showing that your solution is about %172 faster than blackadders! Much better than the %50 faster that you might have thought it was! (And also showing the cost that $& has on your code if you are foolish enough to use it, or if someone else has snuck it into their code and you dont know about it)

      BTW, You are correct that blackadders solution is not correct, my point in this reply is that benchmarking regexes that use $& is not as simple as one might think (or like). Oh also in the future you should avoid using fixed counts in your benchmark. Almost always it is better to use negative numbers indicating how long to benchmark for. The more seconds the better (I find usually 5-10 seconds is good)

      HTH.

      Yves / DeMerphq
      ---
      Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

        I don't get such a large difference, but then, I don't muddy the waters by using subs or by calling external programs.
        use warnings 'all'; use strict; use Benchmark 'cmpthese'; $::unc = '\\\\server_name\\sys_share'; cmpthese -5 => { blackadder => 'my $tmp = $::unc; $tmp =~ s/^\W*\w+//; $server = $&; $server =~ s/^\W+//;', theorbtwo => '$::unc =~ m/^\\\\\\\\([^\\\\]+)\\\\/; # Urgle. my $server = $1;', }; __END__ Name "main::unc" used only once: possible typo at bench line 6. Benchmark: running blackadder, theorbtwo for at least 5 CPU seconds... blackadder: 5 wallclock secs ( 5.08 usr + 0.00 sys = 5.08 CPU) @ 76 +238.39/s (n=387291) theorbtwo: 6 wallclock secs ( 5.07 usr + 0.00 sys = 5.07 CPU) @ 97 +024.65/s (n=491915) Rate blackadder theorbtwo blackadder 76238/s -- -21% theorbtwo 97025/s 27% --
        The reasonable small difference is what I expect. $& isn't as costly as it used to be, and since the blackadder code only has two regexes, you pay the price twice, compared to one for theorbtwo (due to the parens). The larger difference is that blackadder has to _copy_ the string, while that doesn't happen by theorbtwo.

        Abigail

        Sigh. One of these days, I'll learn to proofread and test code before posting to PM, even on things that I shouldn't have to. In this purticular case, I did test the code... in as much as I tested if it ran, but not if it gave the correct results. I never did a mental sanity check on the code from the original post, or printed the output of either method.

        As to the $& problems, I did know about them, but figured that they were too advanced for the poster, and there's a better way to do it anyway. I knew about the leekage of badness, figured it would effect my benchmarks, and that it wouldn't be a major effect. Seems I underestimated how bad it was.


        Confession: It does an Immortal Body good.

      Thanks Guys,...