bulk88 has asked for the wisdom of the Perl Monks concerning the following question:
Is this the right algorithm to convert http time to unix time? am I handling leap years/seconds/timezones correct? can it be made any faster in pure Perl (I dont know C)? am I supposed to -1 on the day of month to prevent 1st of month at 00:00:01 midnight from being ~84000+1 sec instead of 1 sec? The subroutine only needs to work for valid 32 bit unix times. The server is only capable of putting 32 bit unix times out as http dates. So I dont need to worry about getting HTTP dates for 0000 AD or 9999 AD (HTTP date standard only allows 4 digit zero padded years) and the leap years from the beginning of the universe. Here is an output example of the code.#!/usr/bin/perl -w use strict; use HTTP::Date; use Time::HiRes; $\ ="\n"; my %month = (Jan => 0, Feb => 31, Mar => 59, Apr => 90, May => 120, Jun => 151, Jul => 181, Aug => 212, Sep => 243, Oct => 273, Nov => 304, Dec => 334); my $httpdate = 'Wed, 25 Aug 2010 09:56:38 GMT'; sub mytime { my $year = substr($_[0],12,4); my $days = ($year-1970)*365+ # days in full year passed sprintf('%d', ($year-1970)/4)+ # add leap days in years passed $month{substr($_[0],8,3)}+ # add days in full months passed (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed? (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0] +,5,2)) > 59))?1:0); #have we passed the leap day in the current year? return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)+ +substr($_[0],23,2); #hrs/mins/secs } sub mytime2 { my $year = substr($_[0],12,4); my $days = ($year-1970)*365+ # days in full year passed ((($year-1970) - (($year-1970) % 4))/4)+ # add leap days in ye +ars passed $month{substr($_[0],8,3)}+ # add days in full months passed (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed? (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0] +,5,2)) > 59))?1:0); #have we passed the leap day in the current year? return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60) ++substr($_[0],23,2); #hrs/mins/secs } my $t_a = Time::HiRes::time(); for (0 .. 100000) {mytime($httpdate);} print 'with sprintf int '.(Time::HiRes::time()-$t_a)."\n";; $t_a = Time::HiRes::time(); for (0 .. 100000) {mytime2($httpdate);} print 'with algebra int '.(Time::HiRes::time()-$t_a)."\n";; $t_a = Time::HiRes::time(); for (0 .. 100000) {str2time($httpdate);} print 'with HTTP date '.(Time::HiRes::time()-$t_a)."\n";; print "algorithm outputs"; print mytime($httpdate); print mytime2($httpdate); print str2time($httpdate);
My attempts at optimizations. I noticed that replacing sprintf with algebra getInteger equivalent was slightly faster (shown here), I wont be trying POSIX way of getting ints from FPs, I think the algebra method must be the fastest. Also caching the substr for the year was faster than multiple year substrs (not shown in the code). There was no time difference at 100000 iterations between $year = substr and $year = substr - 1970, so I left the "- 1970"s everywhere for code clarity. This is written to work with 1 particular server/site, and there will never be invalid input to the sub, so error checking the input isn't needed. The polling script is not for "production quality", so if the server changes its software I will just rewrite the poll script. Update: Thanks to the poster who suggested using more lookup tables. Full on unix day and sec of day caching seems too much ram for too little gain for me, but caching days in years since epoch isnt that many hash slices, and faster! Caching the substrs if there is under 3 is inefficient. If I am using the substr only twice, it seems faster than making another scalar. Its shown in my revised test script. I'm settling on sub mytime5() as my final converter, its the fastest. The only next logical optimization would be a huge full date and H:M:S hash tables or C. Neither of which work for me. The lightweight HTTP library I'm using doesn't implement http caching, so I have to implement that by hand (1 reason for http date to unix conversion for me). Also my poll script is multithreaded, so the date stamp on the response is needed to synchronize the threads in case of a late/slow http response on a request (I realize that http time is limited to 1 sec resolution). The data is usually changing a few times a second, so polling under 1 second is good. The data records is spread among a dozen pages, and they sometimes rapidly move around among the dozen pages. My script records the position of each data record for a few hours then is manually commanded to exit and dump the position of the data to CSV for later research. Gzip and keepalive are turned on for politeness. Caching logic is manually turned on when the data records are predicted to move only every couple seconds instead of a couple times a second. Regarding issues of politeness of polling webservers, 10 threads polling their respective page every .5 second on an Alexa 100 website for a few hours is a sand grain on beach.C:\Documents and Settings\Owner\Desktop>perl myhttpdate2.pl with sprintf int 0.413187026977539 with algebra int 0.321068048477173 with HTTP date 1.31241106987 algorithm outputs 1282730198 1282730198 1282730198 C:\Documents and Settings\Owner\Desktop>
#!/usr/bin/perl -w use strict; use HTTP::Date; use Time::HiRes; $\ ="\n"; my %month = (Jan => 0, Feb => 31, Mar => 59, Apr => 90, May => 120, Jun => 151, Jul => 181, Aug => 212, Sep => 243, Oct => 273, Nov => 304, Dec => 334); my %year = (1901 => -25202, 1902 => -24837, 1903 => -24472, 1904 => -2 +4107, 1905 => -23741, 1906 => -23376, 1907 => -23011, 1908 => -22646, 1909 => -22280, 1910 => -2 +1915, 1911 => -21550, 1912 => -21185, 1913 => -20819, 1914 => -20454, 1915 => -20089, 1916 => -1 +9724, 1917 => -19358, 1918 => -18993, 1919 => -18628, 1920 => -18263, 1921 => -17897, 1922 => -1 +7532, 1923 => -17167, 1924 => -16802, 1925 => -16436, 1926 => -16071, 1927 => -15706, 1928 => -1 +5341, 1929 => -14975, 1930 => -14610, 1931 => -14245, 1932 => -13880, 1933 => -13514, 1934 => -1 +3149, 1935 => -12784, 1936 => -12419, 1937 => -12053, 1938 => -11688, 1939 => -11323, 1940 => -1 +0958, 1941 => -10592, 1942 => -10227, 1943 => -9862, 1944 => -9497, 1945 => -9131, 1946 => -8766 +, 1947 => -8401, 1948 => -8036, 1949 => -7670, 1950 => -7305, 1951 => -6940, 1952 => -6575, 1953 => -6209 +, 1954 => -5844, 1955 => -5479, 1956 => -5114, 1957 => -4748, 1958 => -4383, 1959 => -4018, 1960 => -3653 +, 1961 => -3287, 1962 => -2922, 1963 => -2557, 1964 => -2192, 1965 => -1826, 1966 => -1461, 1967 => -1096 +, 1968 => -731, 1969 => -365, 1970 => 0, 1971 => 365, 1972 => 730, 1973 => 1096, 1974 => 1461, 1975 + => 1826, 1976 => 2191, 1977 => 2557, 1978 => 2922, 1979 => 3287, 1980 => 3652, 1981 => 4018, 19 +82 => 4383, 1983 => 4748, 1984 => 5113, 1985 => 5479, 1986 => 5844, 1987 => 6209, 1988 => 6574, 19 +89 => 6940, 1990 => 7305, 1991 => 7670, 1992 => 8035, 1993 => 8401, 1994 => 8766, 1995 => 9131, 19 +96 => 9496, 1997 => 9862, 1998 => 10227, 1999 => 10592, 2000 => 10957, 2001 => 11323, 2002 => 11688 +, 2003 => 12053, 2004 => 12418, 2005 => 12784, 2006 => 13149, 2007 => 13514, 2008 => 13879, 2009 => 14245 +, 2010 => 14610, 2011 => 14975, 2012 => 15340, 2013 => 15706, 2014 => 16071, 2015 => 16436, 2016 => 16801 +, 2017 => 17167, 2018 => 17532, 2019 => 17897, 2020 => 18262, 2021 => 18628, 2022 => 18993, 2023 => 19358 +, 2024 => 19723, 2025 => 20089, 2026 => 20454, 2027 => 20819, 2028 => 21184, 2029 => 21550, 2030 => 21915 +, 2031 => 22280, 2032 => 22645, 2033 => 23011, 2034 => 23376, 2035 => 23741, 2036 => 24106, 2037 => 24472 +, 2038 => 24837); my $httpdate = 'Wed, 25 Aug 2010 09:56:38 GMT'; sub mytime { my $year = substr($_[0],12,4); my $days = ($year-1970)*365+ # days in full year passed sprintf('%d', ($year-1970)/4)+ # add leap days in years passed $month{substr($_[0],8,3)}+ # add days in full months passed (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed? (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0] +,5,2)) > 59))?1:0); #have we passed the leap day in the current year? return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60)+ +substr($_[0],23,2); #hrs/mins/secs } sub mytime2 { my $year = substr($_[0],12,4); my $days = ($year-1970)*365+ # days in full year passed ((($year-1970) - (($year-1970) % 4))/4)+ # add leap days in ye +ars passed $month{substr($_[0],8,3)}+ # add days in full months passed (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed? (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0] +,5,2)) > 59))?1:0); #have we passed the leap day in the current year? return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60) ++substr($_[0],23,2); #hrs/mins/secs } sub mytime3 { #yr table my $year = substr($_[0],12,4); my $days = $year{$year}+ #use year day hash $month{substr($_[0],8,3)}+ # add days in full months passed (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed? (((($year)%4) == 0 && (($month{substr($_[0],8,3)}+substr($_[0] +,5,2)) > 59))?1:0); #have we passed the leap day in the current year? return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60) ++substr($_[0],23,2); #hrs/mins/secs } sub mytime4 { #yr table and full substr caching my $year = substr($_[0],12,4); my $month = substr($_[0],8,3); my $day = substr($_[0],5,2); my $days = $year{$year}+ #use year day hash $month{$month}+ # add days in full months passed ($day-1)+ #is doing a -1 to day of month needed? (((($year)%4) == 0 && (($month{$month}+$day) > 59))?1:0); #hav +e we passed the leap day in the current year? return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60) ++substr($_[0],23,2); #hrs/mins/secs } sub mytime5 { #year table and removed redundant year caching my $days = $year{substr($_[0],12,4)}+ #use year day hash $month{substr($_[0],8,3)}+ # add days in full months passed (substr($_[0],5,2)-1)+ #is doing a -1 to day of month needed? ((((substr($_[0],12,4))%4) == 0 && (($month{substr($_[0],8,3)} ++substr($_[0],5,2)) > 59))?1:0); #have we passed the leap day in the +current year? return $days*86400+(substr($_[0],17,2)*60*60)+(substr($_[0],20,2)*60) ++substr($_[0],23,2); #hrs/mins/secs } my $t_a = Time::HiRes::time(); for (0 .. 100000) {mytime($httpdate);} print 'with sprintf int '.(Time::HiRes::time()-$t_a)."\n";; $t_a = Time::HiRes::time(); for (0 .. 100000) {mytime2($httpdate);} print 'with algebra int '.(Time::HiRes::time()-$t_a)."\n";; $t_a = Time::HiRes::time(); for (0 .. 100000) {mytime3($httpdate);} print 'with yr table '.(Time::HiRes::time()-$t_a)."\n";; $t_a = Time::HiRes::time(); for (0 .. 100000) {mytime4($httpdate);} print 'with yr table and full substr caching '.(Time::HiRes::time()-$t +_a)."\n";; $t_a = Time::HiRes::time(); for (0 .. 100000) {mytime5($httpdate);} print 'with yr table no cache '.(Time::HiRes::time()-$t_a)."\n";; $t_a = Time::HiRes::time(); for (0 .. 100000) {str2time($httpdate);} print 'with HTTP date '.(Time::HiRes::time()-$t_a)."\n";; print "algorithm outputs"; print mytime($httpdate); print mytime2($httpdate); print mytime3($httpdate); print mytime4($httpdate); print mytime5($httpdate); print str2time($httpdate);
C:\Documents and Settings\Owner\Desktop>perl myhttpdate2.pl with sprintf int 0.343380928039551 with algebra int 0.312686204910278 with yr table 0.241981983184814 with yr table and full substr caching 0.273297071456909 with yr table no cache 0.235360145568848 with HTTP date 1.24872398376465 algorithm outputs 1282730198 1282730198 1282730198 1282730198 1282730198 1282730198 C:\Documents and Settings\Owner\Desktop>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: high speed/efficiency http date to unix time, leap years handling?
by ikegami (Patriarch) on Aug 27, 2010 at 00:31 UTC | |
|
Re: high speed/efficiency http date to unix time, leap years handling?
by Marshall (Canon) on Aug 27, 2010 at 02:40 UTC | |
|
Re: high speed/efficiency http date to unix time, leap years handling?
by mr_mischief (Monsignor) on Aug 27, 2010 at 04:19 UTC | |
|
Re: high speed/efficiency http date to unix time, leap years handling?
by JavaFan (Canon) on Aug 27, 2010 at 07:43 UTC | |
|
Re: high speed/efficiency http date to unix time, leap years handling?
by Marshall (Canon) on Aug 27, 2010 at 04:40 UTC |