in reply to using xml and perl to perform a search and replace on html files
I've managed to get this working...That, richill, I'd consider a result! Make a backup and lock it in the safe. :-)
I'm worried that creating a new XML::Simple()xml object for every line is a bit wasteful...Perhaps consider parsing the XML once and storing the data in a hash?
Then look at every HTML file checking if any links are in your lookup table and make the change if necessary.$xml_hash($LinkToPage} = ($New_location);
For changing the HTML I would consider a parser. There are many and the one I frequently use is HTML::TokeParser::Simple. Have a look and get back to us if you need a hand.
update: added example of using a parser.
input html:#!/usr/local/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my %xml_hash = ( 'link1.html' => 'linka.html', 'link2.html' => 'linkb.html', ); my $html_file = 'links.html'; my $p = HTML::TokeParser::Simple->new($html_file) or die "couldn't parse $html_file"; my $new_html; while (my $t = $p->get_token){ if ($t->is_start_tag('a')){ my $href = $t->get_attr('href'); if (exists $xml_hash{$href}){ $t->set_attr('href', $xml_hash{$href}); } } $new_html .= $t->as_is; } print "$new_html\n";
output:<html> <head> <title>links</title> </head> <body> <p>links</p> <a href="link1.html">link1</a> <a href="link2.html">link2</a> </body> </html>
<html> <head> <title>links</title> </head> <body> <p>links</p> <a href="linka.html">link1</a> <a href="linkb.html">link2</a> </body> </html>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: using xml and perl to perform a search and replace on html files
by richill (Monk) on Mar 11, 2007 at 02:03 UTC | |
by wfsp (Abbot) on Mar 11, 2007 at 09:52 UTC | |
by richill (Monk) on Mar 11, 2007 at 16:33 UTC |