in reply to web scrap question
wget will do what you want.
The site's robots.txt file lists the directory containing the pdf files as disallowed. Unless you have permission from the owner, you should respect the robots.txt file. The wget utility does this by default but it can be forced to ignore robots.txt if appropriate.
wget with the following options will download the files:
wget -r -l 1 -A .pdf -w 10 -e robots=off http://dynamodata.fdncenter. +org/990s/990search/ffindershow.cgi?id=MITC065
|
|---|