Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: another regex Question

by johncoswell (Acolyte)
on May 24, 2000 at 17:37 UTC ( [id://14548]=note: print w/replies, xml ) Need Help??

in reply to another regex Question

Why not try a split? I process many HTML files with this command. Split the file on every occurance of <TABLE and you won't have to worry about eating up too many tables.

@parsedfile = split(/\<TABLE/,$file);
This way, you can concentrate on only one table at time and won't have to worry about greedy regexps.

foreach $line (@parsedfile) { if ($line =~ /cellpadding\=2/) { do whatever } }

John Coswell -

Replies are listed 'Best First'.
RE: Re: another regex Question
by jjhorner (Hermit) on May 24, 2000 at 18:00 UTC

    You will run into problems with nested tables, won't you?

    A more complex solution would involve keeping track of the numbers of open/close table tags, so you can be sure that you have matches. For instance, each time you pass an open table tag, increment a counter, each time you pass a close table tag, decrement the counter, when the counter goes >1, you are inside a table, when it hits 0, you are outside of a table. If it hits 2 or more, you are inside a nested table.

    I don't know how feasible this is, but it might be useful.

    J. J. Horner

    Linux, Perl, Apache, Stronghold, Unix

      Definitely true. 8^) I guess it depends on if you use nested tables and need to keep track of the nesting for some purpose. If you have a table nested within a table, and you just want to delete the table definition, the information would be plopped into the outer table's cell without formatting, kind of like how you merge cells in PageMill.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://14548]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-21 21:38 GMT
Find Nodes?
    Voting Booth?

    No recent polls found