Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Reverse engineering HTML

by Masem (Monsignor)
on Jun 14, 2001 at 17:34 UTC ( [id://88406]=note: print w/replies, xml ) Need Help??


in reply to Reverse engineering HTML

What an ugly mess... I pity you... :-)

I'm curious as to why there are multiple <HTML> tags in the same document? Assuming that's not an artifact that you created, I would split this huge document up into several parts using these tags as 'delimiters', and handle each piece separately (since multiple <HTML> tags have no value). Within those individual pieces, it might be easier to see structure. In this case, the person used a one-column table probably to get some effect, but it's otherwise useless from what I can tell.

Programmically, if all you can about is extracting the information from the page, it might just be easier to use lynx to get the text versions, possibly intelligently adding <P>, <A>, and <UL> tags and ignoring reset of the formatting, to at least give you a starting point where you have not lost any of the content and can begin anew with the HTML design.


Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://88406]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (6)
As of 2024-04-25 10:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found