high speed/efficiency http date to unix time, leap years handling?

bulk88 has asked for the wisdom of the Perl Monks concerning the following question:

I'm using a lightweight perl http library to poll a webpage on a webserver in a high speed (<.5s per iteration) infinite loop. The HTTP::Response from the http library only gives the http date, so I need to convert it to unix time for logic processing (perl's numerical operators, etc). I see HTTP::Date or writing the format template to some perl date module as the normal solution, but I looked at the code, and its just dozens of statements, multiple regexs, multiple subroutines and multiple modules. There has got to be a more efficient way. So I decided to try to write a highly efficient HTTP date to unix time subroutine.

#!/usr/bin/perl -w
use strict;

use HTTP::Date;
use Time::HiRes;
$\ ="\n";

my %month = (Jan => 0,
Feb => 31,
Mar => 59,
Apr => 90,
May => 120,
Jun => 151,
Jul => 181,
Aug => 212,
Sep => 243,
Oct => 273,
Nov => 304,
Dec => 334);

my $httpdate = 'Wed, 25 Aug 2010 09:56:38 GMT';

sub mytime {
my $year = substr($_[0],12,4);
my $days =  ($year-1970)*365+ # days in full year passed
        sprintf('%d', ($year-1970)/4)+ # add leap days in years passed
        $month{substr($_[0],8,3)}+ # add days in full months passed
        (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed?
        (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0]
+,5,2)) > 59))?1:0); #have we passed the leap day in the current year?
return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)+
+substr($_[0],23,2); #hrs/mins/secs
}

sub mytime2 {
my $year = substr($_[0],12,4);
my $days =  ($year-1970)*365+ # days in full year passed
        ((($year-1970) - (($year-1970) % 4))/4)+ # add leap days in ye
+ars passed
        $month{substr($_[0],8,3)}+ # add days in full months passed
        (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed?
        (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0]
+,5,2)) > 59))?1:0); #have we passed the leap day in the current year?
return  $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)
++substr($_[0],23,2); #hrs/mins/secs
}
my $t_a = Time::HiRes::time();
for (0 .. 100000) {mytime($httpdate);}
print 'with sprintf int '.(Time::HiRes::time()-$t_a)."\n";;

$t_a = Time::HiRes::time();
for (0 .. 100000) {mytime2($httpdate);}
print 'with algebra int '.(Time::HiRes::time()-$t_a)."\n";;

$t_a = Time::HiRes::time();
for (0 .. 100000) {str2time($httpdate);}
print 'with HTTP date '.(Time::HiRes::time()-$t_a)."\n";;

print "algorithm outputs";
print mytime($httpdate);
print mytime2($httpdate);
print str2time($httpdate);
[download]

Is this the right algorithm to convert http time to unix time? am I handling leap years/seconds/timezones correct? can it be made any faster in pure Perl (I dont know C)? am I supposed to -1 on the day of month to prevent 1st of month at 00:00:01 midnight from being ~84000+1 sec instead of 1 sec? The subroutine only needs to work for valid 32 bit unix times. The server is only capable of putting 32 bit unix times out as http dates. So I dont need to worry about getting HTTP dates for 0000 AD or 9999 AD (HTTP date standard only allows 4 digit zero padded years) and the leap years from the beginning of the universe. Here is an output example of the code. C:\Documents and Settings\Owner\Desktop>perl myhttpdate2.pl with sprintf int 0.413187026977539 with algebra int 0.321068048477173 with HTTP date 1.31241106987 algorithm outputs 1282730198 1282730198 1282730198 C:\Documents and Settings\Owner\Desktop> [download] My attempts at optimizations. I noticed that replacing sprintf with algebra getInteger equivalent was slightly faster (shown here), I wont be trying POSIX way of getting ints from FPs, I think the algebra method must be the fastest. Also caching the substr for the year was faster than multiple year substrs (not shown in the code). There was no time difference at 100000 iterations between $year = substr and $year = substr - 1970, so I left the "- 1970"s everywhere for code clarity. This is written to work with 1 particular server/site, and there will never be invalid input to the sub, so error checking the input isn't needed. The polling script is not for "production quality", so if the server changes its software I will just rewrite the poll script. Update: Thanks to the poster who suggested using more lookup tables. Full on unix day and sec of day caching seems too much ram for too little gain for me, but caching days in years since epoch isnt that many hash slices, and faster! Caching the substrs if there is under 3 is inefficient. If I am using the substr only twice, it seems faster than making another scalar. Its shown in my revised test script. I'm settling on sub mytime5() as my final converter, its the fastest. The only next logical optimization would be a huge full date and H:M:S hash tables or C. Neither of which work for me. The lightweight HTTP library I'm using doesn't implement http caching, so I have to implement that by hand (1 reason for http date to unix conversion for me). Also my poll script is multithreaded, so the date stamp on the response is needed to synchronize the threads in case of a late/slow http response on a request (I realize that http time is limited to 1 sec resolution). The data is usually changing a few times a second, so polling under 1 second is good. The data records is spread among a dozen pages, and they sometimes rapidly move around among the dozen pages. My script records the position of each data record for a few hours then is manually commanded to exit and dump the position of the data to CSV for later research. Gzip and keepalive are turned on for politeness. Caching logic is manually turned on when the data records are predicted to move only every couple seconds instead of a couple times a second. Regarding issues of politeness of polling webservers, 10 threads polling their respective page every .5 second on an Alexa 100 website for a few hours is a sand grain on beach.

#!/usr/bin/perl -w
use strict;

use HTTP::Date;
use Time::HiRes;
$\ ="\n";

my %month = (Jan => 0,
Feb => 31,
Mar => 59,
Apr => 90,
May => 120,
Jun => 151,
Jul => 181,
Aug => 212,
Sep => 243,
Oct => 273,
Nov => 304,
Dec => 334);
my %year = (1901 => -25202, 1902 => -24837, 1903 => -24472, 1904 => -2
+4107, 1905 => -23741, 1906 => -23376,
            1907 => -23011, 1908 => -22646, 1909 => -22280, 1910 => -2
+1915, 1911 => -21550, 1912 => -21185,
            1913 => -20819, 1914 => -20454, 1915 => -20089, 1916 => -1
+9724, 1917 => -19358, 1918 => -18993,
            1919 => -18628, 1920 => -18263, 1921 => -17897, 1922 => -1
+7532, 1923 => -17167, 1924 => -16802,
            1925 => -16436, 1926 => -16071, 1927 => -15706, 1928 => -1
+5341, 1929 => -14975, 1930 => -14610,
            1931 => -14245, 1932 => -13880, 1933 => -13514, 1934 => -1
+3149, 1935 => -12784, 1936 => -12419,
            1937 => -12053, 1938 => -11688, 1939 => -11323, 1940 => -1
+0958, 1941 => -10592, 1942 => -10227,
            1943 => -9862, 1944 => -9497, 1945 => -9131, 1946 => -8766
+, 1947 => -8401, 1948 => -8036, 1949 => -7670,
            1950 => -7305, 1951 => -6940, 1952 => -6575, 1953 => -6209
+, 1954 => -5844, 1955 => -5479, 1956 => -5114,
            1957 => -4748, 1958 => -4383, 1959 => -4018, 1960 => -3653
+, 1961 => -3287, 1962 => -2922, 1963 => -2557,
            1964 => -2192, 1965 => -1826, 1966 => -1461, 1967 => -1096
+, 1968 => -731, 1969 => -365, 1970 => 0,
            1971 => 365, 1972 => 730, 1973 => 1096, 1974 => 1461, 1975
+ => 1826, 1976 => 2191, 1977 => 2557,
            1978 => 2922, 1979 => 3287, 1980 => 3652, 1981 => 4018, 19
+82 => 4383, 1983 => 4748, 1984 => 5113,
            1985 => 5479, 1986 => 5844, 1987 => 6209, 1988 => 6574, 19
+89 => 6940, 1990 => 7305, 1991 => 7670,
            1992 => 8035, 1993 => 8401, 1994 => 8766, 1995 => 9131, 19
+96 => 9496, 1997 => 9862, 1998 => 10227,
            1999 => 10592, 2000 => 10957, 2001 => 11323, 2002 => 11688
+, 2003 => 12053, 2004 => 12418, 2005 => 12784,
            2006 => 13149, 2007 => 13514, 2008 => 13879, 2009 => 14245
+, 2010 => 14610, 2011 => 14975, 2012 => 15340,
            2013 => 15706, 2014 => 16071, 2015 => 16436, 2016 => 16801
+, 2017 => 17167, 2018 => 17532, 2019 => 17897,
            2020 => 18262, 2021 => 18628, 2022 => 18993, 2023 => 19358
+, 2024 => 19723, 2025 => 20089, 2026 => 20454,
            2027 => 20819, 2028 => 21184, 2029 => 21550, 2030 => 21915
+, 2031 => 22280, 2032 => 22645, 2033 => 23011,
            2034 => 23376, 2035 => 23741, 2036 => 24106, 2037 => 24472
+, 2038 => 24837);

my $httpdate = 'Wed, 25 Aug 2010 09:56:38 GMT';

sub mytime {
my $year = substr($_[0],12,4);
my $days =  ($year-1970)*365+ # days in full year passed
        sprintf('%d', ($year-1970)/4)+ # add leap days in years passed
        $month{substr($_[0],8,3)}+ # add days in full months passed
        (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed?
        (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0]
+,5,2)) > 59))?1:0); #have we passed the leap day in the current year?
return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)+
+substr($_[0],23,2); #hrs/mins/secs
}

sub mytime2 {
my $year = substr($_[0],12,4);
my $days =  ($year-1970)*365+ # days in full year passed
        ((($year-1970) - (($year-1970) % 4))/4)+ # add leap days in ye
+ars passed
        $month{substr($_[0],8,3)}+ # add days in full months passed
        (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed?
        (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0]
+,5,2)) > 59))?1:0); #have we passed the leap day in the current year?
return  $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)
++substr($_[0],23,2); #hrs/mins/secs
}

sub mytime3 { #yr table
my $year = substr($_[0],12,4);
my $days =  $year{$year}+ #use year day hash
        $month{substr($_[0],8,3)}+ # add days in full months passed
        (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed?
        (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0]
+,5,2)) > 59))?1:0); #have we passed the leap day in the current year?
return  $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)
++substr($_[0],23,2); #hrs/mins/secs
}
sub mytime4 { #yr table and full substr caching
my $year = substr($_[0],12,4);
my $month = substr($_[0],8,3);
my $day = substr($_[0],5,2);
my $days =  $year{$year}+ #use year day hash
        $month{$month}+ # add days in full months passed
        ($day-1)+ #is doing a -1 to day of month needed?
        (((($year)%4) == 0 && (($month{$month}+$day) > 59))?1:0); #hav
+e we passed the leap day in the current year?
return  $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)
++substr($_[0],23,2); #hrs/mins/secs
}
sub mytime5 { #year table and removed redundant year caching
my $days =  $year{substr($_[0],12,4)}+ #use year day hash
        $month{substr($_[0],8,3)}+ # add days in full months passed
        (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed?
        ((((substr($_[0],12,4))%4) == 0 && (($month{substr($_[0],8,3)}
++substr($_[0],5,2)) > 59))?1:0); #have we passed the leap day in the 
+current year?
return  $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)
++substr($_[0],23,2); #hrs/mins/secs
}
my $t_a = Time::HiRes::time();
for (0 .. 100000) {mytime($httpdate);}
print 'with sprintf int '.(Time::HiRes::time()-$t_a)."\n";;

$t_a = Time::HiRes::time();
for (0 .. 100000) {mytime2($httpdate);}
print 'with algebra int '.(Time::HiRes::time()-$t_a)."\n";;

$t_a = Time::HiRes::time();
for (0 .. 100000) {mytime3($httpdate);}
print 'with yr table '.(Time::HiRes::time()-$t_a)."\n";;

$t_a = Time::HiRes::time();
for (0 .. 100000) {mytime4($httpdate);}
print 'with yr table and full substr caching '.(Time::HiRes::time()-$t
+_a)."\n";;

$t_a = Time::HiRes::time();
for (0 .. 100000) {mytime5($httpdate);}
print 'with yr table no cache '.(Time::HiRes::time()-$t_a)."\n";;

$t_a = Time::HiRes::time();
for (0 .. 100000) {str2time($httpdate);}
print 'with HTTP date '.(Time::HiRes::time()-$t_a)."\n";;

print "algorithm outputs";
print mytime($httpdate);
print mytime2($httpdate);
print mytime3($httpdate);
print mytime4($httpdate);
print mytime5($httpdate);
print str2time($httpdate);
[download]

C:\Documents and Settings\Owner\Desktop>perl myhttpdate2.pl
with sprintf int 0.343380928039551

with algebra int 0.312686204910278

with yr table 0.241981983184814

with yr table and full substr caching 0.273297071456909

with yr table no cache 0.235360145568848

with HTTP date 1.24872398376465

algorithm outputs
1282730198
1282730198
1282730198
1282730198
1282730198
1282730198

C:\Documents and Settings\Owner\Desktop>
[download]

Comment on high speed/efficiency http date to unix time, leap years handling? Select or Download Code

Replies are listed 'Best First'.
Re: high speed/efficiency http date to unix time, leap years handling? by ikegami (Patriarch) on Aug 27, 2010 at 00:31 UTC
I spot `(($year)%4)`. That's incomplete. Every year divisible by 4 is a leap except those divisible by 100. But those divisible by 400 are also leap years. `2095 n 2096 y 2097 n 2098 n 2099 n 2100 n <-- %100==0 2101 n 2102 n 2103 n 2104 y 2105 n 1995 n 1996 y 1997 n 1998 n 1999 n 2000 y <-- %400==0 2001 n` [download] But since you're restricting yourself to the years 1970 to 2038, you're ok. You've basically reimplemented (core module) Time::Local's `timegm_nocheck`, so you might want to verify against it.	[reply] [d/l] [select]
Re: high speed/efficiency http date to unix time, leap years handling? by Marshall (Canon) on Aug 27, 2010 at 02:40 UTC
UPDATE: I ran some benchmark code and was surprised!! The Op's two more complex routines are way faster than any shorter version using modules that I could find, including the code below... That was true even I just called timegm(constant_argument_set)! I would have thought that timegm() was more efficient than that! But, apparently not! So it appears that we have a simplicity vs performance thing going on. Using a regex to get the values is expensive versus even complex substr code. If the speed is really that important, then I think we are into using some kind of C code, perhaps inline C. I haven't done that before (never was necessary before with Perl), but I'm sure that I could write some real fast C code. Albeit easier to understand, this code is slower than the more complex versions.... most of the time difference is spent in the regex, which is actually not that surprising. `sub another_converter { my $httpdate = shift; my ($mday,$mon,$year,$hours,$min,$sec) = $httpdate =~ m/(\d+)\s+(\w+)\s+(\d+)\s+(\d+):(\d+):(\d+)/; my $month_num = $month2num{$mon}; my $thistime = timegm($sec,$min,$hours,$mday,$month_num,$year); return ($thistime); }` [download] Original post continues.... Giving timegm() a 4 digit year is completely fine - no need anymore to fiddle with this 1900 adjustment anymore to get a 2 digit year. A date before 1970 will result in a negative epoch time. The one "gottcha" is that the month is zero based instead of one based (0..11), but the hash table takes care of that. timegm() will handle the leap years for you. #!/usr/bin/perl -w use strict; use Time::Local; # the prototype for timegm().... # my $time = timegm($sec,$min,$hours,$mday,$mon,$year); # mday is 1...31 max -> day in month one based # mon is 0..11 -> month in year zero based !!!!!Wow!!! my %month2num = ( 'Jan' => 0, 'Feb' => 1, 'Mar' => 2, 'Apr' => 3, 'May' => 4, 'Jun' => 5, 'Jul' => 6, 'Aug' => 7, 'Sep' => 8, 'Oct' => 9, 'Nov' => 10, 'Dec' => 11, ); my $httpdate = 'Wed, 25 Aug 2010 09:56:38 GMT'; print "httpdate = $httpdate\n"; my ($mday,$mon,$year,$hours,$min,$sec) = $httpdate =~ m/(\d+)\s+(\w+)\s+(\d+)\s+(\d+):(\d+):(\d+)/; $mon = $month2num{$mon}; print "mday=$mday mon=$mon year=$year hours=$hours min=$min sec=$sec\n +"; #prints: mday=25 mon=7 year=2010 hours=09 min=56 sec=38 my $thistime = timegm($sec,$min,$hours,$mday,$mon,$year); print "unix time = $thistime\n"; print "\nusing gmtime() to make sure that we can convert back!\n"; print scalar (gmtime ($thistime)),"\n"; print "another way to use gmtime()\n"; print "".gmtime ($thistime),"\n"; #another way to force scalar con +text __END__ PRINTS: httpdate = Wed, 25 Aug 2010 09:56:38 GMT mday=25 mon=7 year=2010 hours=09 min=56 sec=38 unix time = 1282730198 using gmtime() to make sure that we can convert back! Wed Aug 25 09:56:38 2010 another say to use gmtime() Wed Aug 25 09:56:38 2010 [download] Update: Found this text from a previous post lounging around in my temp directory. But I think it is on topic with the 5 main time functions. There are many modules and many functions, but understanding these will take one a long way. There are three time functions included within Perl itself: 1. localtime() 2. gmtime()-this returns the value of the time function in 9 parts from an "epoch GMT basedtime" 3. time() - makes an epoch second value for "now", can be used by either gmtime() or localtime() The module Time::Local contains the inverse of those functions: 1. timelocal() 2. timegm() - this takes the 9 parts and generates the epoch time. 3. (there is no "inverse" of time() per sea) The operating system keeps track of "time" based upon a continually incrementing integer number of seconds since a specific start date/time. This is called the "epoch time" (returned time() value of zero). For Unix and Windows this is 00:00:00 Jan 1, 1970. I seem to remember that some versions of Apple's OS'es use a different value for this "epoch date/time". The point being is that this "epoch time" value is not transportable in a general sense between platforms. Converting an "epoch time" to a text string is a good way to ensure portability. The time() function produces the current value of this continuously incrementing number of "epoch seconds" since the start of the "epoch" and corresponds to "now". This number never decreases and is independent of daylight savings time. When we "set our clocks back one hour", this "seconds since epoch" number just keeps growing. Just because we set our clocks back one hour, that doesn't change the fact that more seconds are continuing to accrue since the "epoch time". The general way to do "time math" is to convert to epoch seconds, do math in seconds and then convert back to a string. The DateTime module does it that way too and it can apply some fancy "correction values for leap seconds, etc.	[reply] [d/l] [select]
Re: high speed/efficiency http date to unix time, leap years handling? by mr_mischief (Monsignor) on Aug 27, 2010 at 04:19 UTC
I don't have anything to add to the excellent advice you've already received for how to do what you asked. I just want to point out that hitting a web server twice a second to figure out the time and doing things efficiently aren't exactly compatible. Perhaps you want ntp? On Unix-based systems there are ntpd and ntpdate. On Windows there's the W32time service, which can be configured to use NTP time sources rather than the default (which IIRC is the domain controller's time if you're in a domain). The most efficient code is that which is already written and running. Wrapping heavy resources in light layers of interface does not make a lightweight whole. Reimplementing what exists on the system is great for learning a concept, but it's lousy for making the system efficient.	[reply]
Re: high speed/efficiency http date to unix time, leap years handling? by JavaFan (Canon) on Aug 27, 2010 at 07:43 UTC
If you're doing it this many times/second, I'd use two lookup tables. One mapping the date to unix time (at midnight), and one to map "HH:MM:SS" to a number of seconds. There are about 25000 dates in the range, and only 86400 different "HH:MM:SS" values (86401 if you want to have an entry for leap seconds). Then, once your tables are done, you only need to extract two parts of the strings, do two lookups, and one addition. This ignores timezones, but I didn't spot any timezone processing in your code either.	[reply]
Re: high speed/efficiency http date to unix time, leap years handling? by Marshall (Canon) on Aug 27, 2010 at 04:40 UTC
Actually, this whole idea of hitting the same page on the web server 2x per second, 24x7 sounds like pretty impolite behavior. If that was happening to me and I was able to figure it out, I would stop talking to your IP address. Update:~~and I would do my best to blacklist you with other servers!~~..No I wouldn't be that nasty, but you would be on my server's "bad boy" list. All of the algorithms for doing the date stuff are fast enough for what you "need". And actually I don't even see the need for comparisons, if the date changed since then last access then, it most likely got bigger (ruling out some time goof at the sending end).	[reply]