Automated downloads from a date based URL

justin423 has asked for the wisdom of the Perl Monks concerning the following question:

I want to build a script that downloads and loads the XML data that is published at the following link(s): http://www.treasurydirect.gov/xml/CPI_YYYYMMDD.xml The day of the month is not consistent and could vary anywhere from the 12th to the 23rd, so I tried LWP and embedded it in a for loop that started on the 30th, subtracted 1 each time, and exited the loop when it succeeded in downloading the file. for example, the most recent links are:

http://www.treasurydirect.gov/xml/CPI_20150916.xml

http://www.treasurydirect.gov/xml/CPI_20151015.xml

http://www.treasurydirect.gov/xml/CPI_20151117.xml

http://www.treasurydirect.gov/xml/CPI_20151216.xml

As you can see, the date in the URL is not the same each month, and i need to download almost 3 years worth, so I am thinking that there is a quick and dirty solution in Perl to do this. thanks for any help.

Comment on Automated downloads from a date based URL

Replies are listed 'Best First'.
Re: Automated downloads from a date based URL by poj (Abbot) on Dec 31, 2015 at 22:43 UTC
If you try `http://www.treasurydirect.gov/xml` you get a directory listing Read more... (2 kB) poj	[reply] [d/l] [select]
Re: Automated downloads from a date based URL by choroba (Cardinal) on Dec 31, 2015 at 22:44 UTC
It seems the parent directory http://www.treasurydirect.gov/xml/ lists all the files it contains. Just extract the correct file names from there. ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l]
Re: Automated downloads from a date based URL by poj (Abbot) on Jan 01, 2016 at 11:30 UTC
To just get the latest `#!perl use strict; use HTML::Treebuilder::XPath; my $URL = 'http://www.treasurydirect.gov/xml'; my $tree = HTML::TreeBuilder::XPath->new_from_url($URL); my @file = $tree->findnodes_as_strings('//li/a[contains(@href,"CPI_")] +'); my $latest = ( sort @file )[-1]; print "Latest = $latest";` [download] poj	[reply] [d/l]