What an ugly mess... I pity you... :-)
I'm curious as to why there are multiple <HTML> tags in the same document? Assuming that's not an artifact that you created, I would split this huge document up into several parts using these tags as 'delimiters', and handle each piece separately (since multiple <HTML> tags have no value). Within those individual pieces, it might be easier to see structure. In this case, the person used a one-column table probably to get some effect, but it's otherwise useless from what I can tell.
Programmically, if all you can about is extracting the information from the page, it might just be easier to use lynx to get the text versions, possibly intelligently adding <P>, <A>, and <UL> tags and ignoring reset of the formatting, to at least give you a starting point where you have not lost any of the content and can begin anew with the HTML design.
Dr. Michael K. Neylon - mneylon-pm@masemware.com
||
"You've left the lens cap of your mind on again, Pinky" - The Brain
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.