| [reply] |
Instead of HTML::Parser, you might take a look at HTML::TokeParser. I'm currently working on a project using that and it's been a blessing rather than using regular expressions to parse HTML code.
If I understand correctly what you're trying to do, I think HTML::TokeParser may do a better job for you as it breaks everything up in to tokens and you could do your substitutions easily enough and build your HTML back based on your tokens.
Hope that helps!
There is no emoticon for what I'm feeling now. | [reply] |
Another option is HTML::TreeBuilder, which may be more suited towards your goal. Replacing existing tags can be tricky, given the dozens of variants that people may use in their HTML, some valid, most invalid. Let the module figure it out, not a hand-rolled regex.
merlyn has a recent article called "The Wrong Parser for the Right Reasons" which covers something very similar. Give it a read, and see if it suits your goal. A brief synopsis from the top of the article:
More and more these days, you get faced with a problem with angle brackets somewhere in the data. How do you find what you're looking for in HTML or XML data?
At first glance, the question has an obvious answer. If you have an HTML task, you use HTML::Parser or some derived or wrapper class. If you have an XML task, you use XML::Parser or XML::LibXML. But maybe the obvious answer isn't always the best. Let's look at a couple of cases. | [reply] |