in reply to another regex Question

Why not try a split? I process many HTML files with this command. Split the file on every occurance of <TABLE and you won't have to worry about eating up too many tables.

@parsedfile = split(/\<TABLE/,$file);
This way, you can concentrate on only one table at time and won't have to worry about greedy regexps.

foreach $line (@parsedfile) { if ($line =~ /cellpadding\=2/) { do whatever } }

John Coswell -

Replies are listed 'Best First'.
RE: Re: another regex Question
by jjhorner (Hermit) on May 24, 2000 at 18:00 UTC

    You will run into problems with nested tables, won't you?

    A more complex solution would involve keeping track of the numbers of open/close table tags, so you can be sure that you have matches. For instance, each time you pass an open table tag, increment a counter, each time you pass a close table tag, decrement the counter, when the counter goes >1, you are inside a table, when it hits 0, you are outside of a table. If it hits 2 or more, you are inside a nested table.

    I don't know how feasible this is, but it might be useful.

    J. J. Horner

    Linux, Perl, Apache, Stronghold, Unix

      Definitely true. 8^) I guess it depends on if you use nested tables and need to keep track of the nesting for some purpose. If you have a table nested within a table, and you just want to delete the table definition, the information would be plopped into the outer table's cell without formatting, kind of like how you merge cells in PageMill.