khanan has asked for the wisdom of the Perl Monks concerning the following question:
I want to write a Perl script that does two things:
1. Parse a set of HTML pages in different directories recursively and extract all <A HREF=.....> </A> tags into separate text files while maintaining the directory structure. The text files, unique to each html pages/file, have to reside in the same directories as the html pages. (Note that tags/links are to html pages).
2. Use the extracted tags from text files and download the html pages and save them to the respective directories.
I have managed to extract the tags but from and into a single file. The directory and file structure needs to be maintained. So theoretically I would enter the home directory and the script should do the link extraction and download recursively.
Thanks in advance for any tips or previously used code snippets.
Edited 2002-06-20 by mirod: changed title (was: Recursive HTML Tax Extraction) and added formating tags
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Recursive HTML Tag Extraction
by dws (Chancellor) on Jun 20, 2002 at 16:45 UTC | |
|
Re: Recursive HTML Tag Extraction
by hacker (Priest) on Jun 20, 2002 at 16:39 UTC | |
|
Re: Recursive HTML Tax Extraction
by khanan (Initiate) on Jun 20, 2002 at 10:54 UTC |