Re: another regex Question

in reply to another regex Question

Why not try a split? I process many HTML files with this command. Split the file on every occurance of <TABLE and you won't have to worry about eating up too many tables.

@parsedfile = split(/\<TABLE/,$file);
[download]

This way, you can concentrate on only one table at time and won't have to worry about greedy regexps.

foreach $line (@parsedfile) {
  if ($line =~ /cellpadding\=2/) {
    do whatever
  }
}
[download]

John Coswell - http://www.coswell.com

Comment on Re: another regex Question Select or Download Code

Replies are listed 'Best First'.
RE: Re: another regex Question by jjhorner (Hermit) on May 24, 2000 at 18:00 UTC
You will run into problems with nested tables, won't you? A more complex solution would involve keeping track of the numbers of open/close table tags, so you can be sure that you have matches. For instance, each time you pass an open table tag, increment a counter, each time you pass a close table tag, decrement the counter, when the counter goes >1, you are inside a table, when it hits 0, you are outside of a table. If it hits 2 or more, you are inside a nested table. I don't know how feasible this is, but it might be useful. J. J. Horner Linux, Perl, Apache, Stronghold, Unix jhorner@knoxlug.org http://www.knoxlug.org	[reply]
RE: RE: Re: another regex Question by johncoswell (Acolyte) on May 24, 2000 at 19:18 UTC
Definitely true. 8^) I guess it depends on if you use nested tables and need to keep track of the nesting for some purpose. If you have a table nested within a table, and you just want to delete the table definition, the information would be plopped into the outer table's cell without formatting, kind of like how you merge cells in PageMill.	[reply]

In Section Seekers of Perl Wisdom