in reply to Re: Truncating HTML early
in thread Truncating HTML early
I quickly discovered that this module is much more geared towards extracting information from an HTML file than altering one in-place. If you read the author's article in TPJ 19 you'll see as much. The module is really hampered by its lack of any semblance of an identity property. That is, in psuedocode, $document != HTML::TreeBuilder->new($document)->dump_html(); It doesn't preserve whitespace and is apt to change your code by throwing closing tags, etc. While this is all to spec for HTML, we all know that in the real world this sort of behavior tends to break things with the umpteen flaky, finicky versions of NS & IE out there today. This is especially true when your documents were a mishmash of crappy, incorrect HTML in the first place (I work at a major public university, so every professor, student, and club seems to have a different and usually wrong way of making webpages.) So, eventually, I had to decide against using TreeBuilder, even though it would have been much easier and "cooler" from a CS/data structure point of view.
|
|---|