Re: Timestamp as a Filename Collection

Well here's an interesting thing. With yet another example of building filenames out of localtime kicking around, I thought I'd build a benchmark once and for all to show how much faster using an anonymous sub than the other techniques. This is the code I wrote:

#! /usr/bin/perl -w

use strict;
use POSIX;
use Benchmark;

sub via_anonsub {
  sub { sprintf '%04d%02d%02d%02d%02d%02d%05d',
    $_[5]+1900, $_[4]+1, $_[3], $_[2], $_[1], $_[0], $$
  }->(localtime())
}

sub via_sprintf {
  my @time = localtime();
  sprintf '%04d%02d%02d%02d%02d%02d%05d',
    $time[5]+1900, $time[4]+1, $time[3], $time[2], $time[1], $time[0],
+ $$
}

sub via_concat {
  my @time = localtime();
  $time[4]++;
  $time[5]+=1900;
  $time[$_] = $time[$_]<10? "0".$time[$_]:$time[$_] for (0..5);
  my $filename = $time[5].$time[4].$time[3].$time[2].$time[1].$time[0]
+.$$
}

sub via_posix {
  strftime( "%Y%m%d%H%M%S$$", localtime() )
}

print <<"PROOF";
via_concat:  ${\via_concat()}
via_anonsub: ${\via_anonsub()}
via_posix:   ${\via_posix()}
via_sprintf: ${\via_sprintf()}
PROOF

timethese( shift || 10000, {
  'via_anonsub' => \&via_anonsub,
  'via_concat'  => \&via_concat,
  'via_posix'   => \&via_posix,
  'via_sprintf' => \&via_sprintf,
});
[download]

When run on an older Perl (v5.005_03) this produces the following output:

$ perl filename 200000
via_concat:  2002091114542256876
via_anonsub: 2002091114542256876
via_posix:   2002091114542256876
via_sprintf: 2002091114542256876
Benchmark: timing 200000 iterations of via_anonsub, via_concat, via_po
+six, via_sprintf...
via_anonsub:  7 wallclock secs ( 5.60 usr +  0.28 sys =  5.88 CPU)
 via_concat: 21 wallclock secs (18.71 usr +  0.55 sys = 19.27 CPU)
  via_posix: 13 wallclock secs ( 9.59 usr +  0.62 sys = 10.20 CPU)
via_sprintf: 10 wallclock secs ( 6.67 usr +  0.39 sys =  7.06 CPU)
[download]

But just for kicks, I thought I'd take it for a spin on a new machine running 5.8.0 and see what change, if any, appeared. I changed the code (the benchmark code, not the underlying snippets) a bit to use cmpthese instead, and this gives:

via_concat:  2002091114275977636
via_anonsub: 2002091114275977636
via_posix:   2002091114275977636
via_sprintf: 2002091114275977636
Benchmark: running via_anonsub, via_concat, via_posix, via_sprintf for
+ at least 10 CPU seconds...
via_anonsub: 12 wallclock secs (10.12 usr +  0.51 sys = 10.62 CPU) @ 5
+3156.42/s (n=564787)
 via_concat: 10 wallclock secs (10.20 usr +  0.24 sys = 10.44 CPU) @ 3
+0435.83/s (n=317674)
  via_posix: 10 wallclock secs ( 9.57 usr +  0.93 sys = 10.50 CPU) @ 6
+0954.29/s (n=640020)
via_sprintf: 10 wallclock secs (10.20 usr +  0.33 sys = 10.53 CPU) @ 4
+4919.45/s (n=473058)
               Rate  via_concat via_sprintf via_anonsub   via_posix
via_concat  30436/s          --        -32%        -43%        -50%
via_sprintf 44919/s         48%          --        -15%        -26%
via_anonsub 53156/s         75%         18%          --        -13%
via_posix   60954/s        100%         36%         15%          --
[download]

I'm bummed. Somewhere along the line of perl's evolution, the performance of POSIX got better and/or the handling of subroutine calls got worse. Oh well, you can't always use POSIX, and this anonymous subroutine approach works well for other system calls that return multiple values, such as stat and getpwnam. And I can live with 15% ineffeciency I guess. Such is life.

update: It's faster because it doesn't have to deal with building up and tearing down lexicals. And the sub is compiled at, ah, compile time. That's why it should be faster. Look more closely at the code; I'm not building closures.

I used to assume that using POSIX was slower because it was XS code and paying the XS/Perl boundary-crossing cost. Either calling XS is now cheaper, or creating lexicals is now more expensive, relatively speaking.

print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'

Comment on Re: Timestamp as a Filename Collection Select or Download Code

Replies are listed 'Best First'.
Re: Re: Timestamp as a Filename Collection by blssu (Pilgrim) on Sep 11, 2002 at 14:22 UTC
Why do you think the anon sub should be faster? It does the same thing as your simple sprintf -- but it has the overhead of creating a closure too. The anon sub should be slower. Generally you shouldn't use anon subs unless you can factor out some (expensive) common code, or you want to pass functions as arguments to modify how other subs behave. Closures are fairly large objects and can be difficult for people to understand. Here's an example of using a closure to factor out the relatively static year-month-day-hour formatting. sub setup_file_name_generator { # Generate file names of the form # "year-month-day-hour-pid-count". # # The year-month-day-hour-pid portion # is cached and will expire on the hour. my $time = time; my @time = localtime($time); my $base = sprintf('%4d-%02d-%02d-%02d-%d-', $time[5]+1900, $time[4]+1, $time[3], $time[2], $$); my $count = 0; my $expires = $time + 60 * (60 - $time[1]); *generate_file_name = sub { $time = time; if ($time >= $expires) { setup_file_name_generator(); return generate_file_name(); } else { my $name; do { ++$count; $name = $base . $count; } while (-e $name); return $name; } } } setup_file_name_generator(); my $name = generate_file_name(); [download] I've tested the code a bit, but it definitely needs more testing to ensure that the cache expires properly on the hour. It's rather defensive code though -- the `-e $name` check will prevent collisions even if the cache doesn't work perfectly.	[reply] [d/l] [select]
Re: Re: Timestamp as a Filename Collection by blssu (Pilgrim) on Sep 11, 2002 at 20:08 UTC
I am looking closely. You're using an anonymous sub, therefore you're taking a closure. Use `perl -D8` if you don't believe me. (Look for the `anoncode` op.) It's true that your code is compiled at compile time. I said you were taking a closure, not eval'ing a string. Anyways, the `{my @time = ...}` sub is also compiled at compile time. That code should be faster because it does less -- allocating a frame (pad) vs. allocating an empty frame, taking a closure and calling it. You should be happy the simple stuff is getting faster, instead of sad that the obfu code is slower!	[reply] [d/l] [select]