Cleaning up HTML tags

cleverett has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Cleaning up HTML tags by esh (Pilgrim) on Aug 25, 2003 at 04:36 UTC
At the risk of getting negative votes by not actually answering your question, I'd posit that the functionality of the filter matters much more than the interface, provided that the interface allows you to specify the options you require and provides the results you desire. It would be trivial to turn the OO interface you list into a procedural interface. It seems to me the important part is how good the logic is inside the package. Here's some code which takes the OO interface you hate in your first example and lets you call it as the procedural interface you desire in your second example. `package HTML::ProceduralMarkupRemover; sub html_passes_rules { my ($some_html, $html_rules) = @_; my $hsmr = HTML::SomeMarkupRemover->new($html_rules); return $hsmr->passes_rules($some_html); } sub apply_html_rules { my ($some_html, $html_rules) = @_; my $hsmr = HTML::SomeMarkupRemover->new($html_rules); return $hsmr->apply_rules($some_html); } 1;` [download] But why bother? -- Eric Hammond	[reply] [d/l]
Re: Re: Cleaning up HTML tags by cleverett (Friar) on Aug 25, 2003 at 05:18 UTC
Obviously the functionality is very important. But from what I can see, the modules I mentioned in my original post all pretty much do what I need. HTML::Scrubber and HTML::Sanitizer allowing the user to aqdjust his rules on the fly Don't get me wrong: I like objects, a lot. It just bugs me when those snappy OO interfaces don't actually contribute expressive power. So, I'm looking for a procedural alternative, that I haven't been able to find on my own. Why have the overhead of creating and destroying objects to do such a simple thing? Why do I have to type in that extra line of code? As an analogy, who bothers with the OO interface to Digest::MD5 99% of the time? I for one, am so glad the guy who wrote it lets me get by with md5_hex().	[reply]
Re: Re: Re: Cleaning up HTML tags by esh (Pilgrim) on Aug 25, 2003 at 05:52 UTC
Ok, it sounds like we're both on the same page. I also like to gripe about interfaces of various packages when they don't match how I would design them. But, sometimes the functionality is so good, that I put up with a lot of (what I consider to be) strangeness in the inteface. Take <name deleted to protect the guilty> for example, which I love and strongly recommend, just because it does a great job of coming up with results I want. -- Eric Hammond	[reply]
Re: Cleaning up HTML tags by bobn (Chaplain) on Aug 25, 2003 at 05:22 UTC
Nothing stops you from writing your own "procedural" wrapper subroutines around the OO calls. That way you only have to deal with the OO once. Also, maybe the modules allow exporting of methods - check the doc to see if stuff you want is in @EXPORT_OK --Bob Niederman, http://bob-n.com All code given here is UNTESTED unless otherwise stated.	[reply]
Re: Cleaning up HTML tags by antirice (Priest) on Aug 25, 2003 at 04:57 UTC
I can't even think of why you wouldn't want to use an object-oriented interface for this sort of task. The new parses the incoming data into some sort of workable form that is stored to have methods called upon it. This reduces the amount of work each method needs to do since you'd otherwise need to do some sort of parsing for each and every sub called. I just don't see any valid reason you would want to use anything else. Sorry if I'm being a bit daft. antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1	[reply]
Re: Re: Cleaning up HTML tags by cleverett (Friar) on Aug 25, 2003 at 05:58 UTC
As these these modules are written, the new() takes some rules as to what tags/attributes are and aren't allowed and turns them into a list of allowed tags/attributes. So far so good. But I'm not seeing where massaging the rules about allowed and denied tags and attributes generates a win except it might make the checking easier to write. The algorithm for filtering against the rules will still boil down to: `1. get a token which is either html markup or text. stop when none are left. 2. it the token is text, add it to the output 2. drop the tag if it's not allowed 3. drop each attibute not allowed 4. repeat` [download] With a linear problem, I don't see what maintaining state wins for me unless the object accumulates a result for me as I intermittently obtain text to feed it with. Which I admit could be useful, but just not the way I've been programming. UPDATE: actually, just as important issue than linearity in the sense above is the fact that there's only one thing to do with the object, and when you've done it, it's useful life is over.	[reply] [d/l]
Re: Cleaning up HTML tags by mr_stru (Sexton) on Aug 26, 2003 at 03:03 UTC
It seems to me that keeping state is not the issue but more the holding on to the rules. If you have a procedural interface you are presumably going to need some internal variable that holds the rules between setting them and using them. This is fine unless you want to apply different rules to the html depending on the conditions. If that's the case then you are always going to keep changing the rules with a procedural interface. With the OO one you can just have different objects and you can give them nice helpful names. Of course if you're not doing this then it might be moot but it's seems like a good reason to use an OO interface to these sort of modules. Struan	[reply]
Re: Re: Cleaning up HTML tags by cleverett (Friar) on Aug 27, 2003 at 03:29 UTC