Regex-based html processing is generally not regarded as a good idea: it's unreliable, labour-intensive, demanding to maintain and very very difficult to get right. The vast majority of respectable solutions are based on HTML::Parser, either directly or by way of one of the modules that put a simpler interface on it. Ovid's HTML::TokeParser::Simple is probably the one I'd recommend.My own HTML::TagFilter is simpler, but not as good (and not at all diligently maintained :).
If your goal is just to clean, rather than to digest and process, then you would also do well to try HTML::Tidy, a perl interface to the venerable but very effective htmltidy library.
I'm afraid you will almost certainly find that this wheel has already been made for you and that only a half-dozen lines of code are required...
In reply to Re: Cleanning HTML - New/better module for that - test please! ;-P
by thpfft
in thread Cleanning HTML - New/better module for that - test please! ;-P
by gmpassos
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |