What I want is not say what is better or not, actually the HTML::Clean idea to make a filter based in direct changes with RE is good, since use less memory, but it can't know exactly what it does inside the HTML tree. But we can't make a filter full based in parsed HTML tree, since this will be slow, what is not good for a server. My module is something between the 2 ways, and try to look in the basic things that can be cleaned, not very complex ideas, to keep it fast.
I was talking with the author (for now just sent an e-mail, waiting reply) to make some update to the module HTML::Clean with the code that I made. But the code has only 2 days of life, and need tests. I would like that the monks test the code with some Web Sites and see if the output was ok, the same, in the browser. Any idea to make the filter better or comments are gladly accepted!
To test get: http://www.inf.ufsc.br/~gmpassos/htmlclean.zip
Is very small and the test script has only 2 files, and doesn't need to install anything/modules in your Perl.
Graciliano M. P.
"The creativity is the expression of the liberty".
|
---|