Hi monks
I want to pass a treestructure of webpages (all with a common-ish formatting) and add key pieces of data within the pages to a database.
I am really overwhelmed by the number of HTML modules available and wondered if anyone has any comments on what to avoid, best practices, major pitfals, time serving perl-ish plans etc...
I want to glean three pieces of info from the webpage
<TABLE WIDTH="100%" BORDER="0" CELLSPACING="1" CELLPADDING="3"> <!-- CATEGORY --> <TR><TD CLASS="dkblue" COLSPAN="3"><A NAME="Sun Ultra 60"></A><BIG> <B>Sun Ultra 60 Documentation</B></BIG></TD></TR> <TR VALIGN="TOP" CLASS="white"> <TD>804-5884-10</TD> <TD WIDTH="90%"><B>Sun Ultra 60 Hardware AnswerBook Installation</ +B></TD> <TD><A HREF="/products-n-solutions/hardware/docs/pdf/804-5884-10.p +df" TARGET="results">pdf</A> (42KB)</TD></TR> <TR VALIGN="TOP" CLASS="lttan"> <TD>804-5886-10</TD> <TD><B>Installing the Sun Ultra 60 ShowMe How Multimedia Documenta +tion</B></TD> <TD><A HREF="/products-n-solutions/hardware/docs/pdf/804-5886-10.p +df" TARGET="results">pdf</A> (62KB)</TD></TR> <TR VALIGN="TOP" CLASS="white"> <TD>805-1709-12</TD> <TD><B>Sun Ultra 60 Service Manual</B></TD> <TD><A HREF="/products-n-solutions/hardware/docs/pdf/805-1709-12.p +df" TARGET="results">pdf</A> (6.5MB)</TD></TR> <TR VALIGN="TOP" CLASS="lttan"> <TD>805-1762-11</TD> <TD><B>Sun Ultra 60 Reference Manual</B></TD> <TD><A HREF="/products-n-solutions/hardware/docs/pdf/805-1762-11.p +df" TARGET="results">pdf</A> (344KB)</TD></TR> </TABLE>
...but obviously there is loads of other formatting on the page to be getting in my way.
Ideas on any really useful modules?
Any suggestions or tips would be helpful
Thanks monks,
m
In reply to Hints & Tips on passing HTML? by heezy
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |