Re: XML::Simple problem, or How to convert HTML to Perl and then back again.

There are many modules in the HTML hierarchy on CPAN. HTML::TokeParser and HTML::TreeBuilder come to mind. Each one handles the HTML document in a different way depending on how you want to access it. TokeParser as the name implies, tokenizes the HTML into tags and text and lets you make changes and print it out one tag at a time. TreeBuilder converts your document into a tree to represent nested elements.

HTH

Addendum: I was able to scrounge up a script I wrote that searches a given HTML document for table/td/tr tags and removes the width element using HTML::TokeParser::Simple. It's not exactly what you are looking for, but it should give you a head-start:

#!/usr/bin/perl -w
 
use strict;
use HTML::TokeParser::Simple;
 
my $p= HTML::TokeParser::Simple->new(shift);
 
while( my $token=$p->get_token)
{
  $token->delete_attr('width') if 
    $token->is_start_tag(qr/t(?:able|d|r)/);
  print $token->as_is;
}
[download]

Comment on Re: XML::Simple problem, or How to convert HTML to Perl and then back again. Download Code

Replies are listed 'Best First'.
Re: Re: XML::Simple problem, or How to convert HTML to Perl and then back again. by Wonko the sane (Curate) on Jul 11, 2003 at 18:49 UTC
Thank you for your help, though I dont really have a problem parsing the HTML, Its getting it back TO HTML that is causing me the problems. :-) Thanks though. Wonko	[reply]
Re: Re: Re: XML::Simple problem, or How to convert HTML to Perl and then back again. by pzbagel (Chaplain) on Jul 11, 2003 at 19:00 UTC
Not to sound facetious, but it doesn't get any easier than: print $token->as_is; in TokeParser and print $tree->as_HTML; in TreeBuilder to put it back into HTML form. Start with the right parser, an HTML specific one, and get better results. Remember HTML is not as rigid in it's formatting as XML which makes it flexible but a real pain to parse at times. Using a specialized parser for HTML has many benefits. Peace	[reply]
Re: Re: Re: Re: XML::Simple problem, or How to convert HTML to Perl and then back again. by Wonko the sane (Curate) on Jul 11, 2003 at 19:06 UTC
Your right. Maybe HTML::TokeParser::Simple is the right tool I need. I missed what you were gtting at the first time. I am going to go play with it some more. Thank you. Wonko	[reply]