justin423 has asked for the wisdom of the Perl Monks concerning the following question:

I want to build a script that downloads and loads the XML data that is published at the following link(s): http://www.treasurydirect.gov/xml/CPI_YYYYMMDD.xml The day of the month is not consistent and could vary anywhere from the 12th to the 23rd, so I tried LWP and embedded it in a for loop that started on the 30th, subtracted 1 each time, and exited the loop when it succeeded in downloading the file. for example, the most recent links are:

http://www.treasurydirect.gov/xml/CPI_20150916.xml

http://www.treasurydirect.gov/xml/CPI_20151015.xml

http://www.treasurydirect.gov/xml/CPI_20151117.xml

http://www.treasurydirect.gov/xml/CPI_20151216.xml

As you can see, the date in the URL is not the same each month, and i need to download almost 3 years worth, so I am thinking that there is a quick and dirty solution in Perl to do this. thanks for any help.
  • Comment on Automated downloads from a date based URL

Replies are listed 'Best First'.
Re: Automated downloads from a date based URL
by poj (Abbot) on Dec 31, 2015 at 22:43 UTC

    If you try http://www.treasurydirect.gov/xml you get a directory listing


    poj
Re: Automated downloads from a date based URL
by choroba (Cardinal) on Dec 31, 2015 at 22:44 UTC
    It seems the parent directory http://www.treasurydirect.gov/xml/ lists all the files it contains. Just extract the correct file names from there.
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Automated downloads from a date based URL
by poj (Abbot) on Jan 01, 2016 at 11:30 UTC

    To just get the latest

    #!perl use strict; use HTML::Treebuilder::XPath; my $URL = 'http://www.treasurydirect.gov/xml'; my $tree = HTML::TreeBuilder::XPath->new_from_url($URL); my @file = $tree->findnodes_as_strings('//li/a[contains(@href,"CPI_")] +'); my $latest = ( sort @file )[-1]; print "Latest = $latest";
    poj