Timestamp as a Filename Collection

Replies are listed 'Best First'.
Re: Timestamp as a Filename Collection by grinder (Bishop) on Sep 11, 2002 at 13:11 UTC
Well here's an interesting thing. With yet another example of building filenames out of localtime kicking around, I thought I'd build a benchmark once and for all to show how much faster using an anonymous sub than the other techniques. This is the code I wrote: #! /usr/bin/perl -w use strict; use POSIX; use Benchmark; sub via_anonsub { sub { sprintf '%04d%02d%02d%02d%02d%02d%05d', $_[5]+1900, $_[4]+1, $_[3], $_[2], $_[1], $_[0], $$ }->(localtime()) } sub via_sprintf { my @time = localtime(); sprintf '%04d%02d%02d%02d%02d%02d%05d', $time[5]+1900, $time[4]+1, $time[3], $time[2], $time[1], $time[0], + $$ } sub via_concat { my @time = localtime(); $time[4]++; $time[5]+=1900; $time[$_] = $time[$_]<10? "0".$time[$_]:$time[$_] for (0..5); my $filename = $time[5].$time[4].$time[3].$time[2].$time[1].$time[0] +.$$ } sub via_posix { strftime( "%Y%m%d%H%M%S$$", localtime() ) } print <<"PROOF"; via_concat: ${\via_concat()} via_anonsub: ${\via_anonsub()} via_posix: ${\via_posix()} via_sprintf: ${\via_sprintf()} PROOF timethese( shift \|\| 10000, { 'via_anonsub' => \&via_anonsub, 'via_concat' => \&via_concat, 'via_posix' => \&via_posix, 'via_sprintf' => \&via_sprintf, }); [download] When run on an older Perl (v5.005_03) this produces the following output: $ perl filename 200000 via_concat: 2002091114542256876 via_anonsub: 2002091114542256876 via_posix: 2002091114542256876 via_sprintf: 2002091114542256876 Benchmark: timing 200000 iterations of via_anonsub, via_concat, via_po +six, via_sprintf... via_anonsub: 7 wallclock secs ( 5.60 usr + 0.28 sys = 5.88 CPU) via_concat: 21 wallclock secs (18.71 usr + 0.55 sys = 19.27 CPU) via_posix: 13 wallclock secs ( 9.59 usr + 0.62 sys = 10.20 CPU) via_sprintf: 10 wallclock secs ( 6.67 usr + 0.39 sys = 7.06 CPU) [download] But just for kicks, I thought I'd take it for a spin on a new machine running 5.8.0 and see what change, if any, appeared. I changed the code (the benchmark code, not the underlying snippets) a bit to use `cmpthese` instead, and this gives: via_concat: 2002091114275977636 via_anonsub: 2002091114275977636 via_posix: 2002091114275977636 via_sprintf: 2002091114275977636 Benchmark: running via_anonsub, via_concat, via_posix, via_sprintf for + at least 10 CPU seconds... via_anonsub: 12 wallclock secs (10.12 usr + 0.51 sys = 10.62 CPU) @ 5 +3156.42/s (n=564787) via_concat: 10 wallclock secs (10.20 usr + 0.24 sys = 10.44 CPU) @ 3 +0435.83/s (n=317674) via_posix: 10 wallclock secs ( 9.57 usr + 0.93 sys = 10.50 CPU) @ 6 +0954.29/s (n=640020) via_sprintf: 10 wallclock secs (10.20 usr + 0.33 sys = 10.53 CPU) @ 4 +4919.45/s (n=473058) Rate via_concat via_sprintf via_anonsub via_posix via_concat 30436/s -- -32% -43% -50% via_sprintf 44919/s 48% -- -15% -26% via_anonsub 53156/s 75% 18% -- -13% via_posix 60954/s 100% 36% 15% -- [download] I'm bummed. Somewhere along the line of perl's evolution, the performance of POSIX got better and/or the handling of subroutine calls got worse. Oh well, you can't always use POSIX, and this anonymous subroutine approach works well for other system calls that return multiple values, such as stat and getpwnam. And I can live with 15% ineffeciency I guess. Such is life. update: It's faster because it doesn't have to deal with building up and tearing down lexicals. And the sub is compiled at, ah, compile time. That's why it should be faster. Look more closely at the code; I'm not building closures. I used to assume that using POSIX was slower because it was XS code and paying the XS/Perl boundary-crossing cost. Either calling XS is now cheaper, or creating lexicals is now more expensive, relatively speaking. print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'	[reply] [d/l] [select]
Re: Re: Timestamp as a Filename Collection by blssu (Pilgrim) on Sep 11, 2002 at 14:22 UTC
Why do you think the anon sub should be faster? It does the same thing as your simple sprintf -- but it has the overhead of creating a closure too. The anon sub should be slower. Generally you shouldn't use anon subs unless you can factor out some (expensive) common code, or you want to pass functions as arguments to modify how other subs behave. Closures are fairly large objects and can be difficult for people to understand. Here's an example of using a closure to factor out the relatively static year-month-day-hour formatting. sub setup_file_name_generator { # Generate file names of the form # "year-month-day-hour-pid-count". # # The year-month-day-hour-pid portion # is cached and will expire on the hour. my $time = time; my @time = localtime($time); my $base = sprintf('%4d-%02d-%02d-%02d-%d-', $time[5]+1900, $time[4]+1, $time[3], $time[2], $$); my $count = 0; my $expires = $time + 60 * (60 - $time[1]); *generate_file_name = sub { $time = time; if ($time >= $expires) { setup_file_name_generator(); return generate_file_name(); } else { my $name; do { ++$count; $name = $base . $count; } while (-e $name); return $name; } } } setup_file_name_generator(); my $name = generate_file_name(); [download] I've tested the code a bit, but it definitely needs more testing to ensure that the cache expires properly on the hour. It's rather defensive code though -- the `-e $name` check will prevent collisions even if the cache doesn't work perfectly.	[reply] [d/l] [select]
Re: Re: Timestamp as a Filename Collection by blssu (Pilgrim) on Sep 11, 2002 at 20:08 UTC
I am looking closely. You're using an anonymous sub, therefore you're taking a closure. Use `perl -D8` if you don't believe me. (Look for the `anoncode` op.) It's true that your code is compiled at compile time. I said you were taking a closure, not eval'ing a string. Anyways, the `{my @time = ...}` sub is also compiled at compile time. That code should be faster because it does less -- allocating a frame (pad) vs. allocating an empty frame, taking a closure and calling it. You should be happy the simple stuff is getting faster, instead of sad that the obfu code is slower!	[reply] [d/l] [select]
Time changes, localtime and file names by blssu (Pilgrim) on Sep 11, 2002 at 14:29 UTC
It's nice for users to name files using local time, but you have to watch out for time changes! Many locales have some sort of day light savings time which seasonally adjusts `localtime()` (but not `time()` -- that keeps marching forward). For example, in the U.S. this fall, the time period from midnight to 2 am will occur twice in the same day. If you don't check your file names for collisions, you may destroy existing files.	[reply] [d/l] [select]
Re: Timestamp as a Filename Collection by atcroft (Abbot) on Sep 11, 2002 at 03:44 UTC
Why not just do something like `$filename = time() . $$;`, or something along those lines? Just curious, since time() returns the number of seconds since the epoch, and unless you were on a system that could not do 17+ character filenames...	[reply] [d/l]
Re: Re: Timestamp as a Filename Collection by hiseldl (Priest) on Sep 11, 2002 at 03:55 UTC
The trick is that I wanted something immediately understandable with all my filenames the same length, excluding the pid. :) -- hiseldl "Act better than you feel"	[reply]
Re^3: Timestamp as a Filename Collection by atcroft (Abbot) on Sep 11, 2002 at 04:07 UTC
True, but the current results of time() is on the order of 10e9 seconds, and being 32-bit integer, won't go much larger. But I do understand the part about being understandable. If the contents of the directory are understood anyway, though...	[reply]