Think about Loose Coupling | |
PerlMonks |
Traversing directories to get the "most-recent" or "second-to-most-recent" directory contentsby hacker (Priest) |
on Mar 13, 2007 at 03:33 UTC ( [id://604468]=perlquestion: print w/replies, xml ) | Need Help?? |
hacker has asked for the wisdom of the Perl Monks concerning the following question: I have a need to traverse a web tree remotely over http, parse a list of directories which come back, and grab the latest or second-to-latest that are displayed. Once I have that, I need to fetch some files within that directory by name (which includes the date in the title of the filename). For example, I will see something like this: Parent Directory/ - Directory 20060922/ 2006-Nov-13 01:11:31 - Directory 20060927/ 2006-Nov-13 01:16:45 - Directory 20061016/ 2006-Dec-25 03:16:32 - Directory 20061103/ 2006-Dec-25 03:18:05 - Directory 20061202/ 2007-Jan-30 18:07:53 - Directory 20061224/ 2007-Feb-13 23:23:44 - Directory 20070126/ 2007-Mar-11 19:16:45 - Directory 20070208/ 2007-Feb-09 03:04:34 - Directory 20070225/ 2007-Feb-25 23:44:05 - Directory From here, I can see that I want either 20070225or 20070208as the latest and second-to-latest directories in the tree. Once I know this, I need to traverse into one of those directories and fetch a series of files, which have the date in the filename. These files are VERY enormous (tens of gigabytes in size) What is the best approach to solve this problem, keeping in mind that this is over http, remotely, and the ability to resume aborted fetches is highly critical (ala wget -c). Here is the order of events:
Which modules should I be exploring, other than the obvious LWP, WWW::Robot, File::Path, Date::Calc, Date::Manip and such? Are there any canned routines or snippets somewhere that can help? Or in the absence of that, a tutorial that goes through some of this?
Back to
Seekers of Perl Wisdom
|
|