Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re^2: Simplify HTML programatically

by hobbs (Monk)
on Jun 09, 2006 at 05:09 UTC ( #554426=note: print w/replies, xml ) Need Help??

in reply to Re: Simplify HTML programatically
in thread Simplify HTML programatically

For something of a simpler* solution, but in the same vein, there's HTML::TreeBuilder. HTML::Element provides all of the primitives that you really need for an operation like this: look_down to identify relevant elements, replace_with_content to "remove" a tag without removing what it contains, and delete to completely destroy all signs of a given element. I'm not up to writing an example right now, but it's truly simple. Give it a shot! It goes a long way, and the output is bound to be less of a mess than the input.

* edit: okay, I realized that some might be confused by this usage of simple, since trwww's example is pretty simple in itself. Mostly it's a matter of being allowed to think in terms of tree manipulations instead of opens and closes and stacking and de-stacking. The corresponding cost is in storage, but it's usually not worrisome.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://554426]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2022-08-07 19:58 GMT
Find Nodes?
    Voting Booth?

    No recent polls found