Recently a cowoker was running some performance tests on a Java daemon he's building, and he discovered that he was spending a huge chunk of time formating date/times for his log files. Now, ordinarily there wouldn't be anything wrong with that, except:

Which means that in the course of a given second, he's spending a lot of time formatting several java Date objects that result in the same string. So to reduce the amount of wasted time, he created a little wrapper class for his formater, which cached the last Date and formatted String, and if no more then a second had elapsed the next time it was called, he just reused the cached String.

Major speed improvement.

Which lead me to wonder about the same performance issue in perl. Perl times are only precise down to the second, but what if you are only formating dates down to the minute? Here's what I Benchmarked, as you can see, I used a really high numer of iterations, to make sure it lasted enough wallclock seconds to be a worthwhile test. It should also be clear that for applications that expect to do a lot of time formatting at a high frequency, it's worth it to cache the string.

(I'll leave it as an excercise for the reader to determine if there's any significant performance downside in applications where you don't format times at a high frequency)

use Benchmark qw(cmpthese countit); use POSIX qw(strftime); use constant TIME => 10000000; use constant FORMAT => '%Y-%m-%d %H:%M'; my $trash; sub plain_strftime { return strftime(FORMAT, localtime(shift)); } my ($timecache, $strcache) = (0, &plain_strftime($timecache)); sub cache_strftime { my $t = shift; if ($t - $timecache > 60) { $timecache = $t; $strcache = &plain_strftime($timecache); } return $strcache; } cmpthese(TIME, { 'plain' => sub { $trash = &plain_strftime(time); }, 'cache' => sub { $trash = &cache_strftime(time); }, }); __DATA__ laptop:~> monk.pl Benchmark: timing 10000000 iterations of cache, plain... cache: 79 wallclock secs (67.66 usr + 3.69 sys = 71.35 CPU) @ 14 +0154.17/s (n=10000000) plain: 282 wallclock secs (248.43 usr + 7.03 sys = 255.46 CPU) @ + 39145.07/s (n=10000000) Rate plain cache plain 39145/s -- -72% cache 140154/s 258% --

Replies are listed 'Best First'.
Re: Wasting time with times
by rob_au (Abbot) on Sep 15, 2002 at 23:30 UTC
    Another side to the argument would be to ask how readable do the application logs really need to be? Depending upon the application, the specifics of logging are insignificant - It is more important to simply have the information available if its needed, in any format, rather then being able to browse it in a human-readable format.

    To this end, I would say that there is little wrong with simply logging date and time as seconds since epoch, especially given that there is no high precision required in the time stamps provided. This effectively side-steps this argument altogether allowing for logging without the formatting performance hit and if there is a later need for debugging or log analysis, it is relatively simple to have a log file parser that reformats the log file timestamps into a human readable format.

     

      No, no, no, no, no! If no one ever needs to look at it then why are you logging it at all? If someone is going to look at it then it will probably be because something has gone wrong and then every-thing you do to make it easier for the person doing the fault finding is appreciated and reflects well on you as a developer. Say what you detected and be explicit, say why it means you can't continue and say when it happened. Remember no matter how obscure the log when something is wrong someone will look at it and be trying to make sense of it.

      My three top gripes with developers-

      • programs that detect an error condition and die without reporting the error condition any-where
      • log files without date-time stamps (I kid you not) or date-time stamps that are inconsistant or unreadable
      • meaningless or un-helpfull error messages like- "an error has occurred", "software error", "terminating due to an abnormal condition", "error-20756a" etc.

      By avoiding these you make it easier to locate the source of the problem, easier to give quality bug reports and easier to represent you well to the client because you have made me feel in control of the problem: I have an idea what is going on and I can convey that confidently to the client- even if I have no idea how to fix it.

      </end rant> :)

      --
      Until you've lost your reputation, you never realize what a burden it was or what freedom really is. -Margaret Mitchell

        You miss the point of my argument entirely - I am most definitely not saying "Do not perform any logging", nor am I saying "Do not timestamp those logs that you do generate". My point was that logging, depending upon the application, does not necessarily have to be carried out in a human readable format, or with human-readable time stamps, particularly when this is done at the cost of application performance. Indeed, I am of the opinion that a great deal of information should be logged and indeed many of my applications log entry and exit points of major subroutines as a matter of standard practice.

        An example of this practice would be the Windows Eventlog API - This logging API does not store message information in a readily-parseable, human-readable text format, but rather a binary format which allows for rapid logging and compact storage - These logs can then be accessed programmatically by APIs or through the Eventlog viewer. The end-result is that all information is available to the user or developer working with the Eventlog, despite the fact that the storage format is not natively human-readable.

        Logging should be a symbiotic adjunct to an application not a mandatory deadweight.

         

        Shrug. You have a good point... but you're missing big peices of it. Unreadable is a relitive thing. If the logfile is documented as having the first field be timestamp in seconds since Jan 1 1979 23:59:59, then it isn't a problem. If it lists things like 1032155104 16011503 "Creating asdfoij object" -2147221005, that's fine, so long as the documentation tells you that the fields are seconds-since-epoch, PID, filename, freeform error text, and COM error code, and comes with docs for how to look up those fields. (Preferably a program to parse the logfile and give you useful errors.)


        Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

Re: Wasting time with times
by Zaxo (Archbishop) on Sep 16, 2002 at 03:01 UTC

    Cached is 7.135E-6 seconds each call, uncached is 2.555E-5 seconds each. At 100 requests per second that is .7ms vs. 2.5ms. That's probably not too significant.

    In a socket server with filesystem logging, I/O is apt to be the real bottleneck. Threading would be useful. For thread safety, you need the :locked attribute set on &cache_strftime to prevent a race condition.

    After Compline,
    Zaxo

Re: Wasting time with times
by Aristotle (Chancellor) on Sep 16, 2002 at 12:06 UTC

    I concur with Zaxo++. Just look at your absolute rates: even the "slow" method is producing nearly 40,000 date strings per second. I can't imagine that being a real bottleneck under any but the highest load situations. Remember that what's true for Java is not directly applicable in Perl. String processing in general tends to be, well, not quite blazing fast in Java while it's what Perl does best.

    In other notes, you may be interested in Dominus' article on memoization. See also his Memoize module.

    Makeshifts last the longest.

      Remember that what's true for Java is not directly applicable in Perl.

      No argument there, I just considered it an interesting excercise. I guess I would say that the moral of the story is:

      Time formating in perl may be fast, but with 4 lines of code it can be faster.