ler224 has asked for the wisdom of the Perl Monks concerning the following question:

I am using DateTime in a while loop with over 1 million (up to 300 million) lines to calculate a date & time field. Currently I had all of the code in the while loop. I moved the $formatter code outside of the while loop. But the speed is very slow. The primary reason I am using the DateTime module is to convert the data which is in UTC to New York time. The date changes since I am going past 12AM with the time zone change. Any suggestions on how I can speed this up? Should I avoid DateTime due to the number of rows?
while(<>){ chomp; $line = $_; my @line = split(';',$line); $line[4] =~ s/:/\//; ($month) = $line[4] =~ m/(\d+)\/\d+/; ($day) = $line[4] =~ m/\d+\/(\d+)/; $hour = (int($line[7]/3600000)) % 24; $minute = sprintf '%02d', (int($line[7]/60000)) % 60; $second = sprintf '%02d', (int($line[7]/1000)) % 60; $ms = sprintf '%03d', $line[7] % 1000; $formatter = new DateTime::Format::Strptime( pattern => '%Y-%m-%d %H:%M:%S.%3N'); $dhms = DateTime->new(year => $line[5], month => $month, day => $day, hour => $hour, minute => $minute, second => $second, nanosecond => $ms * 1000000, time_zone => 'Etc/UTC', formatter => $formatter, ); $dhms = $dhms->clone->set_time_zone('America/New_York'); }

Replies are listed 'Best First'.
Re: DateTime speed improvement - suggestion
by tobyink (Canon) on Apr 09, 2014 at 13:44 UTC

    You don't need the ->clone in there. The following should be sufficient:

    $dhms = DateTime->new( ..., time_zone => 'Etc/UTC', formatter => $formatter, ); $dhms->set_time_zone('America/New_York');

    Another possible speed up (though I don't think it's likely to make much difference) is to move the timezone lookup outside the loop:

    my $formatter = ...; my $utc = DateTime::TimeZone->new(name => "Etc/UTC"); my $nyc = DateTime::TimeZone->new(name => "America/New_York"); while (<>) { ...; $dhms = DateTime->new( ..., time_zone => $utc, formatter => $formatter, ); $dhms->set_time_zone($nyc); }
    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
Re: DateTime speed improvement - suggestion
by Laurent_R (Canon) on Apr 09, 2014 at 17:34 UTC
    Not sure you will gain very much, but you could reduce these three regular expressions:
    $line[4] =~ s/:/\//; ($month) = $line[4] =~ m/(\d+)\/\d+/; ($day) = $line[4] =~ m/\d+\/(\d+)/;
    to only one:
    my (month, $day) = $line[4] =~ m/(\d+):(\d+)/;
    But the thing that you should really do to start with is to profile your code, using for example http://search.cpan.org/~timb/Devel-NYTProf-5.06/lib/Devel/NYTProf.pm. Then only you will know where your program is spending time.