Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Simplify HTML programatically

by hobbs (Monk)
on Jun 09, 2006 at 05:09 UTC ( [id://554426]=note: print w/replies, xml ) Need Help??


in reply to Re: Simplify HTML programatically
in thread Simplify HTML programatically

For something of a simpler* solution, but in the same vein, there's HTML::TreeBuilder. HTML::Element provides all of the primitives that you really need for an operation like this: look_down to identify relevant elements, replace_with_content to "remove" a tag without removing what it contains, and delete to completely destroy all signs of a given element. I'm not up to writing an example right now, but it's truly simple. Give it a shot! It goes a long way, and the output is bound to be less of a mess than the input.

* edit: okay, I realized that some might be confused by this usage of simple, since trwww's example is pretty simple in itself. Mostly it's a matter of being allowed to think in terms of tree manipulations instead of opens and closes and stacking and de-stacking. The corresponding cost is in storage, but it's usually not worrisome.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://554426]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2024-03-28 21:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found