in reply to Timestamp as a Filename Collection

Well here's an interesting thing. With yet another example of building filenames out of localtime kicking around, I thought I'd build a benchmark once and for all to show how much faster using an anonymous sub than the other techniques. This is the code I wrote:

#! /usr/bin/perl -w use strict; use POSIX; use Benchmark; sub via_anonsub { sub { sprintf '%04d%02d%02d%02d%02d%02d%05d', $_[5]+1900, $_[4]+1, $_[3], $_[2], $_[1], $_[0], $$ }->(localtime()) } sub via_sprintf { my @time = localtime(); sprintf '%04d%02d%02d%02d%02d%02d%05d', $time[5]+1900, $time[4]+1, $time[3], $time[2], $time[1], $time[0], + $$ } sub via_concat { my @time = localtime(); $time[4]++; $time[5]+=1900; $time[$_] = $time[$_]<10? "0".$time[$_]:$time[$_] for (0..5); my $filename = $time[5].$time[4].$time[3].$time[2].$time[1].$time[0] +.$$ } sub via_posix { strftime( "%Y%m%d%H%M%S$$", localtime() ) } print <<"PROOF"; via_concat: ${\via_concat()} via_anonsub: ${\via_anonsub()} via_posix: ${\via_posix()} via_sprintf: ${\via_sprintf()} PROOF timethese( shift || 10000, { 'via_anonsub' => \&via_anonsub, 'via_concat' => \&via_concat, 'via_posix' => \&via_posix, 'via_sprintf' => \&via_sprintf, });

When run on an older Perl (v5.005_03) this produces the following output:

$ perl filename 200000 via_concat: 2002091114542256876 via_anonsub: 2002091114542256876 via_posix: 2002091114542256876 via_sprintf: 2002091114542256876 Benchmark: timing 200000 iterations of via_anonsub, via_concat, via_po +six, via_sprintf... via_anonsub: 7 wallclock secs ( 5.60 usr + 0.28 sys = 5.88 CPU) via_concat: 21 wallclock secs (18.71 usr + 0.55 sys = 19.27 CPU) via_posix: 13 wallclock secs ( 9.59 usr + 0.62 sys = 10.20 CPU) via_sprintf: 10 wallclock secs ( 6.67 usr + 0.39 sys = 7.06 CPU)

But just for kicks, I thought I'd take it for a spin on a new machine running 5.8.0 and see what change, if any, appeared. I changed the code (the benchmark code, not the underlying snippets) a bit to use cmpthese instead, and this gives:

via_concat: 2002091114275977636 via_anonsub: 2002091114275977636 via_posix: 2002091114275977636 via_sprintf: 2002091114275977636 Benchmark: running via_anonsub, via_concat, via_posix, via_sprintf for + at least 10 CPU seconds... via_anonsub: 12 wallclock secs (10.12 usr + 0.51 sys = 10.62 CPU) @ 5 +3156.42/s (n=564787) via_concat: 10 wallclock secs (10.20 usr + 0.24 sys = 10.44 CPU) @ 3 +0435.83/s (n=317674) via_posix: 10 wallclock secs ( 9.57 usr + 0.93 sys = 10.50 CPU) @ 6 +0954.29/s (n=640020) via_sprintf: 10 wallclock secs (10.20 usr + 0.33 sys = 10.53 CPU) @ 4 +4919.45/s (n=473058) Rate via_concat via_sprintf via_anonsub via_posix via_concat 30436/s -- -32% -43% -50% via_sprintf 44919/s 48% -- -15% -26% via_anonsub 53156/s 75% 18% -- -13% via_posix 60954/s 100% 36% 15% --

I'm bummed. Somewhere along the line of perl's evolution, the performance of POSIX got better and/or the handling of subroutine calls got worse. Oh well, you can't always use POSIX, and this anonymous subroutine approach works well for other system calls that return multiple values, such as stat and getpwnam. And I can live with 15% ineffeciency I guess. Such is life.


update: It's faster because it doesn't have to deal with building up and tearing down lexicals. And the sub is compiled at, ah, compile time. That's why it should be faster. Look more closely at the code; I'm not building closures.

I used to assume that using POSIX was slower because it was XS code and paying the XS/Perl boundary-crossing cost. Either calling XS is now cheaper, or creating lexicals is now more expensive, relatively speaking.


print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'

Replies are listed 'Best First'.
Re: Re: Timestamp as a Filename Collection
by blssu (Pilgrim) on Sep 11, 2002 at 14:22 UTC

    Why do you think the anon sub should be faster? It does the same thing as your simple sprintf -- but it has the overhead of creating a closure too. The anon sub should be slower.

    Generally you shouldn't use anon subs unless you can factor out some (expensive) common code, or you want to pass functions as arguments to modify how other subs behave. Closures are fairly large objects and can be difficult for people to understand.

    Here's an example of using a closure to factor out the relatively static year-month-day-hour formatting.

    sub setup_file_name_generator { # Generate file names of the form # "year-month-day-hour-pid-count". # # The year-month-day-hour-pid portion # is cached and will expire on the hour. my $time = time; my @time = localtime($time); my $base = sprintf('%4d-%02d-%02d-%02d-%d-', $time[5]+1900, $time[4]+1, $time[3], $time[2], $$); my $count = 0; my $expires = $time + 60 * (60 - $time[1]); *generate_file_name = sub { $time = time; if ($time >= $expires) { setup_file_name_generator(); return generate_file_name(); } else { my $name; do { ++$count; $name = $base . $count; } while (-e $name); return $name; } } } setup_file_name_generator(); my $name = generate_file_name();

    I've tested the code a bit, but it definitely needs more testing to ensure that the cache expires properly on the hour. It's rather defensive code though -- the -e $name check will prevent collisions even if the cache doesn't work perfectly.

Re: Re: Timestamp as a Filename Collection
by blssu (Pilgrim) on Sep 11, 2002 at 20:08 UTC

    I am looking closely. You're using an anonymous sub, therefore you're taking a closure. Use perl -D8 if you don't believe me. (Look for the anoncode op.)

    It's true that your code is compiled at compile time. I said you were taking a closure, not eval'ing a string. Anyways, the {my @time = ...} sub is also compiled at compile time. That code should be faster because it does less -- allocating a frame (pad) vs. allocating an empty frame, taking a closure and calling it.

    You should be happy the simple stuff is getting faster, instead of sad that the obfu code is slower!