Using HTML::Parser to edit files in place

markjugg has asked for the wisdom of the Perl Monks concerning the following question:

A couple years ago I wrote a CGI script which read a file of of disk, turned all the HTML table cells into form areas of the magically right size, and allowed a novice user to edit the page in this fashion. (When they submitted the form, it inserted the contents of the form into the appropriate places in the same file.) Unfortunately, the whole system was line-based instead of tag-based, which made it somewhat fragile.

Now I'd like to rewrite it to be tag-based. HTML::Parser looks like it will be my friend here, and maybe even HTML::TableExtract

However, I seem to missing a simple concept with HTML::Parser-- I can't figure out how to use it edit a file in place. The logic would be: "If I've found a table cell, process it, and stick the contents and stick it back the HTML stream". Could someone provide an a barebones example of this logic to jump start me? Thanks!

-mark

Comment on Using HTML::Parser to edit files in place

Replies are listed 'Best First'.
Re: Using HTML::Parser to edit files in place by tadman (Prior) on Mar 01, 2001 at 04:41 UTC
As you might have guessed, HTML::Parser is your friend, but there isn't a direct "in-place" system for editing. The Parser, however, does give you enough information to weasel your way around, should you require it. As I described in an an earlier post, you can ask Parser for 'offset' and 'length' information on the bits and pieces it gives you, and these are relative to the scalar you sent to the parser in the first place. These will enable you to re-write parts, should you so desire it. You mentioned wanting to convert content that lives inside a table cell, such as something that is inside a TD tag. So, you will want to watch out for TD tags, and to act accordingly. This could be handled by a sub which re-writes to the desired output, given the entire contents of the TD cell, tags and all. Read more... (5 kB)	[reply] [d/l] [select]
Re: Re: Using HTML::Parser to edit files in place by markjugg (Curate) on Mar 15, 2001 at 01:00 UTC
I started using this as the basis of my own solution (thanks!), but I realized the problem space was more complex than simply processing each TD tag. When the file is updated from the CGI environment, I need to match up the form fields from the CGI form with the TD tags. That could be accomplished by numbering the form fields in the order they appear in file. Not so hard. This is harder: In the old script, I had a nice trick to figure out what size to make the form input field (and whether to make it a text box or a textarea). I based the size on the largest piece of content in a particular column. To this, I had to read through the whole table once before I processed the first TD cell. After some contemplation, I realized that all I needed to do to fix the old script was to simply remove any newline characters that appear in the form. Since I control the template file, I could already make sure that each TD appeared on a single line, that the HTML was complete enough, etc. This quick fix is almost unfortunate because in many other regards, the script could use several "good style" updates, including: using CGI.pm, using 'strict', seperating the code from the design with a template. It was last worked on almost 2 years ago. I've learned a lot since then. :) Despite it's poor style, it's been a useful tool over the years. Perhaps I'll get around to genericizing it, documenting it and releasing it into the wild. -mark	[reply]
Re: Using HTML::Parser to edit files in place by rpc (Monk) on Mar 01, 2001 at 04:46 UTC
The logic would be like this: Set up a default handler (default_h) which has access to the tag name and unparsed HTML. Set up a callback to look for table cells. If you haven't found a table cell, spit out the unparsed HTML. If you have found the start tag of a table cell, set a flag. You know any concurrent calls to your callback will be content within the table cell, until you hit a closing tag. `use HTML::Parser; my $found = 0; sub callback { my($tagname, $text) = @_; if($tagname and $tagname eq 'td') { if(not $found) { # start tag. $found = 1; } else { # end tag. $found = 0; } return; } print $text and return unless $found; # If you're here, then the markup is within the td tags. } my $p = HTML::Parser->new(api_version => 3, default_h => [\&callback,'tagname, text']);` [download] Hope this helps..	[reply] [d/l]
Re: Re: Using HTML::Parser to edit files in place by merlyn (Sage) on Mar 01, 2001 at 05:47 UTC
This is starting in the right direction, but it breaks on unbalanced-but-legal TD tags. (And you guys all wonder why XML always has to be properly balanced? Because writing code with optional tags is a royal pain.) -- Randal L. Schwartz, Perl hacker	[reply]
Re: Using HTML::Parser to edit files in place by markjugg (Curate) on Mar 01, 2001 at 22:09 UTC
Thank you all for your examples and help! This will be a great start for me. I'll report back how it goes, and hopefully even end up with something that's re-useable for others. :) I going to set up the template by hand to avoid the problem merlyn describes, but that's definitely an important consideration for "real world" cases. -mark	[reply]