Re^5: easy HTML::TokeParser help request

If you really have a fixed format that you can guarantee isn't going to change (e.g. this is a one-off throw away program to convert old data into a new format), sure go ahead and use regexen. Otherwise you'll find out n months down the road that you're going to spend the time again re-implementing it when the HTML changes because the designer got a new version of Dreampage 06 X.

As for which parser is better: depends. Which is better, a Ferrari or a heavy duty pickup? Try moving a couple palates of bricks with the former, or winning a race with the later.

In my personal experience the answer is: depends. :) I used to use TokeParser more than TreeBuilder (writing my own RSS feeds before sites provided them themselves), but more often that's now the other way around. As you can see, for a task with more context sensitivity (foo elements 2 levels down inside bar elements) it's more scaffolding from the programmer to do things with TokeParser than with a tree. But there's other types of tasks (extract any foo elements with class zorch) that'll probably be simpler to think of in the TokeParser manner.

If you haven't looked at it you should also take a gander at HTML::TokeParser::Simple which provides an even nicer token interface.

Comment on Re^5: easy HTML::TokeParser help request

Replies are listed 'Best First'.
Re^6: easy HTML::TokeParser help request by 2ge (Scribe) on Aug 04, 2006 at 12:20 UTC
I think, when we are talking about changing format, in many times there is also needed change also program, not only regexp. Now it is ok, assume it will not change. Thanks for nice answer about comparing TokeParser and TreeBuilder, that is really enough for me and I see the difference. Maybe there should be some other module putting those two propertie together (take token and have also tree stored somewhere). Dont know if it is possible. The main thing is - problem is solved, and I hope this node will help also to other monks!	[reply]

Replies are listed 'Best First'.

Re^6: easy HTML::TokeParser help request
by 2ge (Scribe) on Aug 04, 2006 at 12:20 UTC

I think, when we are talking about changing format, in many times there is also needed change also program, not only regexp. Now it is ok, assume it will not change.

[reply]