If you add a ? after the *, it will make the expression
non-greedy. As long as you don't have any nested tables,
this will work. In greedy mode (the default), the regex
will grab as much as it can into that .*, matching everything
from the first correct table tag, through the very last
closing table tag.
First read this faq. In a nutshell, HTML parsing, especially something like analyzing arbitrary tables, is pretty difficult. There are modules designed especially for this, though, so check out HTML::Parser and HTML::TokeParser. Also see answers to a similar question here.
cellpadding, cellPadDing, ceLLpaddinG are all the same. I actually saw a table-extract module which may be of use to you.
Personally, I would not advise doing any HTML parsing yourself. Use modules -- their authors know their stuff!
You will run into problems with nested tables, won't you?
A more complex solution would involve keeping track of the numbers of open/close table tags, so you can be sure that you have matches. For instance, each time you pass an open table tag, increment a counter, each time you pass a close table tag, decrement the counter, when the counter goes >1, you are inside a table, when it hits 0, you are outside of a table. If it hits 2 or more, you are inside a nested table.
I don't know how feasible this is, but it might be useful.
Definitely true. 8^) I guess it depends on if you use nested tables and need to keep track of the nesting for some purpose. If you have a table nested within a table, and you just want to delete the table definition, the information would be plopped into the outer table's cell without formatting, kind of like how you merge cells in PageMill.