in reply to efficiency & style

#!/usr/bin/perl -w use strict; $date = `date +"%l:%M:%S"`; # I'd use sprintf and localtime my %service= ( www => qr/:0050\s/, ssh => qr/:0016\s/, irc => qr/:1A0B\s/, ftp => qr/:0015\s/, ); my %count= qw( www 0 ssh 0 irc 0 ftp 0 ); foreach my $line ( grep /\b01\s/, `cat /proc/net/tcp` ) { foreach my $svc ( keys %service ) { $count{$svc}++ if $line =~ $service{$svc}; } } open (FILE, "+>/var/www/web/main.txt") or die "Can't write to /var/www/web/main.txt: $!\n"; print FILE "There are $count{www} web, $count{ftp} ftp, ", "$count{ssh} ssh, and $count{irc} irc connections ", "to this server as of $date\n"; close FILE;
        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
RE (tilly) 2: efficiency & style
by tilly (Archbishop) on Sep 23, 2000 at 12:04 UTC
    When seeking to match a series of strings like this there is no reason not to push the loop down to the RE engine. First you:
    my %service = qw( 0050 www 0016 ssh 1A0B irc 0015 ftp ); my $match_lst = join "|", keys %service; my $match = qr/:($match_lst)\s/;
    and then you change the loop to:
    foreach my $line (grep /\b01\s/, `cat /proc/net/tcp` ) { ++$count{$service{$1}} while $line =~ /$match/g; }
    If you want you can speed it up even more by optimizing the RE as I did in RE (tilly) 4: SAS log scanner.
      Hmm. I disbelieve that using alternation is as efficient as looping over a list of patterns. I believe the following benchmark backs me up:

      tilly gives: 1600 chetlin gives: 1600 Benchmark: running chetlin, tilly, each for at least 5 CPU seconds... chetlin: 9 wallclock secs ( 5.52 usr + 0.00 sys = 5.52 CPU) @ 33 +3.70/s (n=1842) tilly: 10 wallclock secs ( 5.09 usr + 0.00 sys = 5.09 CPU) @ 10 +4.52/s (n=532)

      Here's the code for it; do feel free to slap me around if I made a thinko:

      my @patterns=qw/foo bar baz blarch/; my $tilly=qr/(@{[join "|",@patterns]})/; my @chetlin=map qr/$_/,@patterns; my $target="foo baz blarcy foo blarch"x400; sub tilly { my $count; $count++ while ($target =~ /$tilly/g); print STDERR "tilly gives: $count\n" if ((caller)[1]!~/eval/); } sub chetlin { my $count; for (@chetlin) {$count++ while ($target =~ /$_/g) } print STDERR "chetlin gives: $count\n" if ((caller)[1]!~/eval/); } tilly(); chetlin(); use Benchmark; timethese(-5, { tilly => \&tilly, chetlin => \&chetlin, });

      In general, my credo is to avoid alternation at all costs. I would be interested in seeing what a benchmark of your optimized alternation (ref. the pointer you gave above) would give.

      -dlc

        /:(00(?:1[56]|50)|1A0B)\s/

        Is hand optimized. The automated approach would produce (5|6) instead of [56].

                - tye (but my friends call me "Tye")
        Better, but not as good as yours.

        The last time we did this we found that the tried approach was significantly faster on data sets with matches to be found. See RE (tilly) 4: SAS log scanner for details.

        Live and learn.

RE: Re: efficiency & style
by djw (Vicar) on Sep 24, 2000 at 19:04 UTC
    Now see, I knew there was a better way. I have seen the => thing before and I can see how its being used here, but I don't really know what that is :)
    I really like how you used the hash to create the service keys at 0 value....slick. I will have to figure out that foreach loop on my own. I'll have to use another example and see if I can understand how its being used.
    I should have been using the die statement in the first place. I know better than that.
    I really appreciate the input.

    Thanks,
    djw