chud83 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have a CSV file with data in the following format: date,rate1,rate2
e.g.

2009031905,3558,1.68 2009031906,6385,2.42 2009031907,8860,3.68 2009031908,39224,14.16
This data goes all the way back for every hour of the last 40 days however I am attempting to write a perl script to extract the previous business week's (mon-fri) data only.
I have written the following code to decrement the day using the same format however this will not account for previous days overlapping months and years etc.

Can anyone suggest a better way of doing this? I am new to perl and not familiar with how to convert the YYYYMMDDHH string into a date format so I can select a specific range.
@months = qw(01 02 03 04 05 06 07 08 09 10 11 12); @weekDays = qw(01 02 03 04 05 06 07); ($second, $minute, $hour, $dayOfMonth, $month, $yearOffset, $dayOfWeek +, $dayOfYear, $daylightSavings) = localtime(); $year = 1900 + $yearOffset; $theTime = "$year$months[$month]$dayOfMonth$hour";<br> <br> $theOldTime = $year. $months[$month]. ($dayOfMonth -2). $hour;<br> <br> print $theTime . "\n";<br> print $theOldTime . "\n";<br>
Any help would be much appreciated!
Many thanks.

Replies are listed 'Best First'.
Re: Extract data between certain dates using YYYYMMDDHH timestamp
by Corion (Patriarch) on Mar 20, 2009 at 13:33 UTC

    time or gmtime gives you a current timestamp. If you don't care too much about exactness, you can just decrement it by 24*40*3600 to get a date roughly 40 days before.Time::Local converts from a unix epoch timestamp to the usual time tuple. POSIX::strftime is quite convenient for formatting time tuples.

    If you want to be exact, you can either use DateTime, or if you want to avoid DateTime, subtract 22 hours worth of seconds from your timestamp in a loop, and exit that loop once the date part of your day has changed 40 times.

Re: Extract data between certain dates using YYYYMMDDHH timestamp
by Bloodnok (Vicar) on Mar 20, 2009 at 13:39 UTC
    In addition to Corions splendid suggestions, you might also consider Date::Calc to do the conversion/difference calculations...

    A user level that continues to overstate my experience :-))
Re: Extract data between certain dates using YYYYMMDDHH timestamp
by apl (Monsignor) on Mar 20, 2009 at 14:13 UTC
    Another good CPAN module, especially if you're concerned with business days (Monday thru Friday, excluding holidays) is Date::Business
Re: Extract data between certain dates using YYYYMMDDHH timestamp
by ikegami (Patriarch) on Mar 20, 2009 at 15:26 UTC

    How do you define "previous" in "previous business week"? I'm going to assume you want the last M-F in which the current day is not a member.

    use strict; use warnings; use DateTime qw( ); use Text::CSV_XS qw( ); my $today = DateTime->today( time_zone => 'local' ); my $dow = $today->dow(); my $last_fri = $today->subtract( days => ( $dow <= 5 ? $dow-5+7 : $dow-5 ) ); my $last_mon = $last_fri->subtract( days => 4, ); my $start = $last_fri->strftime("%Y%m%d00"); my $end = $last_mon->strftime("%Y%m%d99"); my $csv = Text::CSV_XS->new(); while (<$fh_in>) { $csv->parse($_) or die("csv: " . $csv->error_diag() . "\n"); my ($ts) = $csv->fields(); next if $ts lt $start || $ts gt $end; print $fh_out $_; }
Re: Extract data between certain dates using YYYYMMDDHH timestamp
by johngg (Canon) on Mar 20, 2009 at 16:37 UTC

    Since your dates are in YYYYMMDDHH format you can use string comparisons to decide whether the datum is in the correct range. This script decrements the current epoch time value by the relevant number of seconds needed to go back to the Friday of the previous week and then another four days to get to the Monday. It then constructs start and end timestamp strings for hour 0 of the Monday and hour 23 of the Friday.

    use strict; use warnings; my %daysOffsetTV = ( 0 => 86400 * 2, 1 => 86400 * 3, 2 => 86400 * 4, 3 => 86400 * 5, 4 => 86400 * 6, 5 => 86400 * 7, 6 => 86400, ); my $nowTV = time(); my $dayOfWk = ( localtime( $nowTV ) )[ 6 ]; my $prevWkFriTV = $nowTV - $daysOffsetTV{ $dayOfWk }; my( $friDay, $friMth, $friYr ) = ( localtime( $prevWkFriTV ) )[ 3 .. 5 ]; $friMth += 1; $friYr += 1900; my $endDate = sprintf q{%04d%02d%02d%02d}, $friYr, $friMth, $friDay, 23; my $prevWkMonTV = $prevWkFriTV - 86400 * 4; my( $monDay, $monMth, $monYr ) = ( localtime( $prevWkMonTV ) )[ 3 .. 5 ]; $monMth += 1; $monYr += 1900; my $startDate = sprintf q{%04d%02d%02d%02d}, $monYr, $monMth, $monDay, 0; print qq{Starting date: $startDate\n}, qq{ Ending date: $endDate\n\n}; while( <DATA> ) { my $timestamp = ( split m{,} )[ 0 ]; print unless $timestamp lt $startDate or $timestamp gt $endDate; } __END__ 2009030822,3558,1.68 2009030823,6385,5.47 2009030900,7485,1.82 2009030901,8563,4.35 2009031322,8860,3.68 2009031323,39224,14.16 2009031400,34553,7.34 2009031401,7353,5.74

    The output.

    Starting date: 2009030900 Ending date: 2009031323 2009030900,7485,1.82 2009030901,8563,4.35 2009031322,8860,3.68 2009031323,39224,14.16

    I hope this is of interest.

    Cheers,

    JohnGG

      Not all days have 86400 seconds. Looks like it can fail near a daylight savings time change (by getting Su-Th or Tu-Sa instead of M-F, I think).
Re: Extract data between certain dates using YYYYMMDDHH timestamp
by ig (Vicar) on Mar 20, 2009 at 21:00 UTC

    Date::Manip also has nice functions for manipulating dates.

    #!/usr/bin/perl use strict; use warnings; use Date::Manip; use Text::CSV_XS qw( ); my $end = UnixDate("last Friday", "%Y%m%d23"); my $start = UnixDate(DateCalc("last Friday", "- 4 days"), "%Y%m%d00"); my $csv = Text::CSV_XS->new(); while (<$fh_in>) { $csv->parse($_) or die("csv: " . $csv->error_diag() . "\n"); my ($ts) = $csv->fields(); next if $ts lt $start || $ts gt $end; print $fh_out $_; }
      Thanks for all your help with this guys. Will have a play with the various methods and see how I get on :)
Re: Extract data between certain dates using YYYYMMDDHH timestamp
by bichonfrise74 (Vicar) on Mar 20, 2009 at 21:06 UTC
    I used Date::Calc as someone suggested to get the business days.
    #!/usr/bin/perl use strict; use Date::Calc qw(:all); while( <DATA> ) { my $old_data = $_; my ($old_date) = $_ =~ /^(\d{8})\d+,.*/; my ($year, $month, $day) = $old_date =~ /(\d{4})(\d{2})(\d{2})/; my $dow = Day_of_Week( $year, $month, $day ); print "$old_data" if ( $dow >= 1 && $dow <= 5 ); } __DATA__ 2009030822,3558,1.68 2009030823,6385,5.47 2009030900,7485,1.82 2009030901,8563,4.35 2009031322,8860,3.68 2009031323,39224,14.16 2009031400,34553,7.34 2009031401,7353,5.74 2009032107,8860,3.68

      That will get business days only but it will get more than just the last week of the 40 days of history in the data file.