I was testing the module HTML::Clean to make a filter flag to the output of mod_perl for HPL (another HTML/Perl embed). But when I started to
see the source, how the code is cleaned, I saw that the filter can make some
mistakes with complex HTML. So I decided to make my own filter, but one that
doesn't change the final result in the browser. I made some tests with
HTML::Clean and my new module, and saw that I got a better filter (without changes in
the result) and that clean better/more. (I have used www.cnn.com.br & www.perl.com pages that
have styles, javascript, etc...)
What I
want is not say what is better or not, actually the HTML::Clean idea to make a filter
based in direct changes with RE is good, since use less memory, but it can't know exactly what it does inside the HTML tree. But
we can't make a filter full based in parsed HTML tree, since this will be
slow, what is not good for a server. My module is something between the 2
ways, and try to look in the basic things that can be cleaned, not very
complex ideas, to keep it fast.
I was talking with the author (for now just sent an e-mail, waiting reply) to make some update to the module HTML::Clean with the code that I made. But the code has only 2 days of life, and need tests. I would like that the monks test the code with some Web Sites and see if the output was ok, the same, in the browser. Any idea to make the filter better or comments are gladly accepted!
To test get: http://www.inf.ufsc.br/~gmpassos/htmlclean.zip
Is very small and the test script has only 2 files, and doesn't need to install anything/modules in your Perl.
Graciliano M. P.
"The creativity is the expression of the liberty".
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.