hmbscully has asked for the wisdom of the Perl Monks concerning the following question:

Oh the joys of old code and data-formats I have no control over.

I get data that lists locations, and for each location there is an event that takes place on multiple dates. Each event gets its own line with the date. Example data:

|Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 09-DEC-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 09-JUN-07| <readmore> |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 10-FEB-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 14-APR-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 28-OCT-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 10-DEC-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 10-JUN-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 15-APR-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 29-OCT-06|

The problem is in that last field with the date. Something changed in the database that this information is pulled from and I used to get the data so that the dates were in chronological order with no intervention on my part. But now they're sorted alphabetically/numerically and that isn't how we want them to display. The DB's can't seem to figure out how to give me the data the way it always has been, so I'm trying to come up with a perl solution to handle it on my end and I'm stumped.

This is the code that parses/returns the data, to a webpage. Please note, I know this code sucks, its nearly 7 years old and up until this date, its just worked as is without me having to touch it.

#!/usr/bin/perl use CGI qw/:standard/; use CGI::Carp 'fatalsToBrowser'; require '../code_paths.conf'; $query = new CGI; $COUNTRY = $query->param('COUNTRY'); $LOOKUP = $query->param('LOOKUP'); print $query->header(); #declare hashes needed for script %testcenters; %centeraddress; %centercity; %centerstateprov; %centermailcountry; %centerpostalcode; %centeradmin; %centerfax; %centerphone; %centerdates; <readmore> if($LOOKUP eq "canada"){ open(DATA,"$osus_data/canada/$COUNTRY") || &no_results; } if($LOOKUP eq "usa"){ open(DATA,"$osus_data/usa/$COUNTRY") || &no_results; } if($LOOKUP eq "others"){ open(DATA,"$osus_data/others/$COUNTRY") || &no_results; #open(DATA,"$osus_data/others/$COUNTRY")|| die "Can't open $osus_d +ata/others/$COUNTRY"; } while($line = <DATA>){ chomp $line; ($junk,$o_country,$o_province,$o_tcenter,$o_addr +ess,$o_city,$o_stprov,$o_mailcountry,$o_postalcode,$o_firstname,$o_la +stname,$o_title,$o_phone,$o_fax,$o_date) = split(/\|/,$line); #the hashes are filled using the test center as the key $testcenters{$o_city} = $o_tcenter; #testcenters keyed from c +ity $centeraddress{$o_tcenter} = $o_address; $centercity{$o_tcenter} = $o_city; $centerstateprov{$o_tcenter} = $o_stprov; $centermailcountry{$o_tcenter} = $o_mailcountry; $centerpostalcode{$o_tcenter} = $o_postalcode; #the next line builds the center administrator name pieces int +o one string if ($o_title ne ""){ $centeradmin{$o_tcenter} = "$o_firstname $o_lastname, $o_t +itle"; } else { $centeradmin{$o_tcenter} = "$o_firstname $o_lastname $o +_title"; } #deal with that Daekyo center to remove the comma $centerfax{$o_tcenter} = $o_fax; $centerphone{$o_tcenter} = $o_phone; $centerdates{$o_tcenter} .= "$o_date|"; #combine the dates fo +r same test center } #end of while($line = <DATA>) &print_header; print <<TOP_OF_PAGE; <table width="565" border="0" cellspacing="0" cellpadding="5"> <TR><TD><h1>ACT Assessment Test Dates for $COUNTRY</h1></TD></TR> <TR> <TD>Students must register directly with the supervisor of the test ce +nter where they wish to test, not with ACT. The deadline date for con +tacting the test center is the Friday two weeks before the scheduled +test date. There is no standby testing provision.</TD></TR></TABLE> TOP_OF_PAGE @mycenters = sort byString keys %testcenters; #returns list of centerc +ity in alpha order foreach $city(@mycenters){ #this foreach loop will go through each city foreach $center($testcenters{$city}){ #this foreach loop builds the code snippets for each test center #set up table for individual test center entries print "<TABLE WIDTH=565 CELLPADDING=5 CELLSPACING=5>\n"; print "<HR align=center width=555>"; print "<TR>\n\t<TD WIDTH=400>\n"; print "$testcenters{$city}<BR>"; #print testcenter name if(($centeraddress{$center}) eq ""){} #address field not filled, s +kip else { print "$centeraddress{$center}<BR>"; } #print testcenter a +ddress if(($centercity{$center}) eq ""){} #city field not filled, skip else { print "$centercity{$center}, "; } #print testcenter city if(($centerstateprov{$center}) eq ""){} #state/province field not +filled, skip else { print "$centerstateprov{$center}, "; } #print testcenter s +tate/province if(($centerpostalcode{$center}) eq ""){} #postal code field not fi +lled, skip else { print "$centerpostalcode{$center}"; } #print testcenter po +stal code if(($centermailcountry{$center}) eq ""){} #mailcountry field not f +illed, skip else { print "<BR>$centermailcountry{$center}<BR>"; } #print test +center mailing country print "\n<P><i>For information on registering at this test center +contact:</i><br>\n"; print "<b>$centeradmin{$center}</b><br>"; #print center administr +ator name, title if(($centerphone{$center}) eq ""){} #phone field not filled, skip else { print "Phone: $centerphone{$center}<BR>"; } #print testcen +ter phone if(($centerfax{$center}) eq ""){} #fax field not filled, skip else { print "Fax: $centerfax{$center}<BR>"; } #print testcenter +fax print "\n\t</td>\n"; #end center information table column print "\t<TD WIDTH=165 VALIGN=TOP>\n<UL>\n\t"; #start date bullet +ed list column my @testdates = split(/\|/, $centerdates{$center}); #split up the + center dates #my @sorted_dates = sort byNum @testdates; #put the dates in ord +er foreach $item(@testdates) #print each date switched from @sorted +_dates { print "<LI>$item\n"; } print "</UL></TD></TR>\n"; #close list, ect print "<TR><TD COLSPAN=2></TD></TR></TABLE>\n"; } #end of foreach $center($testcenters{$city}) } #end of foreach $city(@mycenters) &print_footer; sub byNum{ $a <=> $b; } sub byString { lc($a) cmp lc($b); } sub print_header{ if($server_url eq 'www.actstudent.org'){ $header = $header_osus; } elsif($server_url eq 'testdev.act.org'){ $header = $review_header_osus; } print <<END_HTMLHEADER; <html> <head> <TITLE>ACT Test Dates for $COUNTRY</TITLE> END_HTMLHEADER open(HEADER,"$header"); print <HEADER>; close HEADER; } sub print_footer{ print <<END_HTMLFOOTER; <!-- end content --> </td> </tr> <tr> <td width="205" align="left" valign="bottom" id="l-col"><img src="http +://$server_url/images/side_bottom.gif" width="205" height="30"></td> </tr> <tr align="left" valign="top"> <td colspan="3"> END_HTMLFOOTER open(FOOTER,"$footer"); print <FOOTER>; close FOOTER; print <<END_HTMLFOOTER2; <!-- end footer --> </td></tr></table> <!-- end outer table --> <p>&nbsp;</p> </div> </body> </html> END_HTMLFOOTER2 exit 0; } sub no_results{ $STATE = "."; print <<END_HTMLHEAD; <html> <head> <TITLE>ACT Test Dates for $COUNTRY</TITLE> END_HTMLHEAD &print_header; print <<TOP_OF_PAGE2; <table width="565" border="0" cellspacing="0" cellpadding="5"> <TR><TD><h1>ACT Assessment Test Dates for $COUNTRY</h1></TD></TR> <TR> <TD>Currently, there are no available test dates or test locations ava +ilable for $COUNTRY.<br><br> <a href="http://$server_url/regist/outside.html">Search for another lo +cation.</a></TD> </TR></TABLE> TOP_OF_PAGE2 &print_footer; } }

How does one force a sort on a date formatted such as this is? There used to be a numerical sort on the data, from another data change some years ago, but that won't solve this problem now.

TIA

Replies are listed 'Best First'.
Re: How to re-order badly formed dates within a delimited string?
by derby (Abbot) on Jun 15, 2006 at 14:54 UTC

    Well ... getting the DBA to fix it should be the first step ... is there a SQL statement that you control for retrieving the data? If so, double checking for an ORDER BY clause would be helpful.

    How big are the data files and how constant is the date field format? If small and always the same, you could go with the real cheap and easy approach by first reading in the data and then sorting it yourself. But first you need to get the date into a string easy to sort:

    my %months = ( JAN => '01', FEB => '02', MAR => '03', APR => '04', MAY => '05', JUN => '06', JUL => '07', AUG => '08', SEP => '09', OCT => '10', NOV => '11', DEC => '12' ); my %records; while( $line = <DATA> ) { chomp( $line ); my @line = split( /\|/, $line ); my @date = split( /-/, $line[-1] ); # bad practice here .. hopefully no dates from # last century (or next) my $year = "20" . $date[2]; my $mon = $months{$date[1]}; my $day = (split( /\s+/, $date[0]))[1]; # push onto array in case multiple events for date push( @{$records{$year.$mon.$day}}, $line ); } foreach my $date ( sort keys %records ) { foreach my $record ( @{$records{$date}} ) { # do your thing } }
    that's pretty brittle code but may be good enough for your needs

    -derby

      I have no control over the data in the DB. I get the data pushed to me by a DBA, I can't touch the SQL. The files are relatively small and the data field format is consistent across them all. I'll give it a whirl. Might just have to be a stop-gap until the DBA figures out the real issue.

      I learn more and more about less and less until eventually I know everything about nothing.

Re: How to re-order badly formed dates within a delimited string?
by Anonymous Monk on Jun 16, 2006 at 00:34 UTC

    Hi,
    Has some helpful DBA managed to change a DATETIME column
    to VARCHAR?

    I think it would have required quite some effort, but it's
    probably worth checking the table spec.

Re: How to re-order badly formed dates within a delimited string?
by CountZero (Bishop) on Jun 15, 2006 at 15:56 UTC
    Obviously the problem arises because the previous format and the new format are different.

    So one has to find a "translator" to change the new format into the old format. Can you give us an example of the "old" format so we can think about changing the "new" format into the old?

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      The date format remains the same: Day DD-MON-YY

      The problem is that always before the data was provided to me in such away that the dates were already in the right chronological order, so when my lookup script looked for locations, and printed out the results, they were already in the right order. Now they're not in the right order and I can't get the data provider to fix whatever changed.

      I must admit, I don't understand all the code in the example provided, but I'm working through it, attempting to learn.

        OK, in that case you must re-sort the file with the list of dates.

        As this involves parsing every record (a potentially "expensive" operation, to be repeated for every comparison in the sort) the "Schwartzian Transform" will come to your rescue:

        use strict; use DateTime::Format::Strptime; my $dtf = new DateTime::Format::Strptime( pattern => '%a %d-%b-%y', lo +cale => 'en_US'); my @events = <DATA>; my @sorted_events = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, $dtf->parse_datetime( (split /\|/)[14])->ymd ] } @events; print @sorted_events; # now you save the array of sorted events and "feed" that to your scri +pt __DATA__ |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 09-DEC-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 09-JUN-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 10-FEB-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 14-APR-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 28-OCT-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 10-DEC-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 10-JUN-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 15-APR-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 29-OCT-06|

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: How to re-order badly formed dates within a delimited string?
by monsterzero (Monk) on Jun 15, 2006 at 18:54 UTC
    Hello, In order to sort by date you will need to convert that format to one that can be sorted. This is where you may want to use the DateTime suite of modules. Hope this helps
    use strict; use warnings; use Datetime; use DateTime::Format::Strptime; my $Strp = new DateTime::Format::Strptime( pattern => '%a %d-%b-%y', ) +; my @dates; while (<DATA>) { chomp; my @date_text = split(/\|/); push( @dates, $Strp->parse_datetime( $date_text[-1] ) ); } my @sorted_dates = sort @dates; print map { "$_\n" } @sorted_dates; __DATA__ |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 09-DEC-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 09-JUN-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 10-FEB-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 14-APR-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sat 28-OCT-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 10-DEC-06| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 10-JUN-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 15-APR-07| |Puerto Rico||Santiago Apostol Catholic School|Las Croabas Road, Urb S +anta Isidra|Fajardo|PR||00738|Modesto|Garcia|Counselor|787/863-0524|7 +87/863-6655|Sun 29-OCT-06|