vxp has asked for the wisdom of the Perl Monks concerning the following question:
Hi.
I do a lot (and I mean - A LOT) of HTML parsing, for various reasons. I use the HTML::TreeBuilder and HTML::TokeParser for these purposes.
what would be REALLY useful, for me, is a "flowchart", so to speak, of an HTML file that I am about to parse. I'll explain what I mean in a bit.
The thing is that my brain seems to process things visually much better than by thinking about HTML tags and arranging them in my head. So, for instance, take this HTML (straight out of HTML::TreeBuilder's documentation):
<ul> <li>Ice cream.</li> <li>Whipped cream. <li>Hot apple pie <br>(mmm pie)</li> </ul>
The TreeBuilder will construct the following tree out of it:
<html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) <ul> @0.1.0 <li> @0.1.0.0 "Ice cream." <li> @0.1.0.1 "Whipped cream. " <li> @0.1.0.2 "Hot apple pie " <br> @0.1.0.2.1 "(mmm pie)"
now, that's wonderful. beautiful. but as I said - I am more of a visual person. So at the moment what I do with this stuff - is I draw that damn flowchart by hand. So in the example above, it'd look like this:
rectangle at the top, with "html" written in it. below that, I'd have two siblings - a rectangle with "head" in it and a rectangle with "body" in it. under body would be a rectangle with "ul" and under "ul" would be a rectangle with "li" in it. and under "li" would be a string ("ice cream").
You get the idea, I hope. The flowchart helps with the visualization of the document's tree. that makes it very easy to come up with an algorithm to rip out whatever contents I need from that tree.
So, my question is - is there a perl script (or anything else, really. i don't care if its perl or not, although it'd be awesome if it was a perl solution) that I can feed an HTML file into, and it'd produce that flowchart that I described above?
Thanks for any comments / suggestions.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: HTML tree - making a flowchart of it.
by merlyn (Sage) on Oct 28, 2009 at 14:22 UTC | |
|
Re: HTML tree - making a flowchart of it.
by Fletch (Bishop) on Oct 28, 2009 at 14:47 UTC | |
by metaperl (Curate) on Oct 28, 2009 at 16:07 UTC |