in reply to Re^2: Need direction on mass find/replacement in HTML files.
in thread Need direction on mass find/replacement in HTML files.

To parse HTML and manipulate a URI it might be worth considering using modules that can do the heavy lifting for you.

This presses HTML::TreeBuilder and URI into service.

#! /usr/bin/perl use strict; use warnings; use HTML::TreeBuilder; use URI; my $t = HTML::TreeBuilder->new_from_file(*DATA) or die qq{TB->new failed: $!\n}; my @anchors = $t->look_down(_tag => q{a}); for my $anchor (@anchors){ my $href = $anchor->attr(q{href}); my $uri = URI->new($href); my $host = $uri->host; next unless $host eq q{www.mysite.org}; my %query_form = $uri->query_form; next unless exists $query_form{page}; my $replace = sprintf(q{pages/%s.htm}, $query_form{page}); $anchor->attr(q{href}, $replace); } print $t->as_HTML(q{}, q{ }, {p => 0}); __DATA__ <html><head><title>mysite</title></head> <body> <p><a href="http://www.mysite.org/?page=contacts">text</a></p> <p><a href="http://www.mysite.org/?page=newsletter">text</a></p> <p><a href="http://www.mysite.org/?page=faq">text</a></p> </body> </html>
<html> <head> <title>mysite</title> </head> <body> <p><a href="pages/contacts.htm">text</a></p> <p><a href="pages/newsletter.htm">text</a></p> <p><a href="pages/faq.htm">text</a></p> </body> </html>
See also HTML::Element.