To parse HTML and manipulate a URI it might be worth considering using modules that can do the heavy lifting for you.

This presses HTML::TreeBuilder and URI into service.

#! /usr/bin/perl use strict; use warnings; use HTML::TreeBuilder; use URI; my $t = HTML::TreeBuilder->new_from_file(*DATA) or die qq{TB->new failed: $!\n}; my @anchors = $t->look_down(_tag => q{a}); for my $anchor (@anchors){ my $href = $anchor->attr(q{href}); my $uri = URI->new($href); my $host = $uri->host; next unless $host eq q{www.mysite.org}; my %query_form = $uri->query_form; next unless exists $query_form{page}; my $replace = sprintf(q{pages/%s.htm}, $query_form{page}); $anchor->attr(q{href}, $replace); } print $t->as_HTML(q{}, q{ }, {p => 0}); __DATA__ <html><head><title>mysite</title></head> <body> <p><a href="http://www.mysite.org/?page=contacts">text</a></p> <p><a href="http://www.mysite.org/?page=newsletter">text</a></p> <p><a href="http://www.mysite.org/?page=faq">text</a></p> </body> </html>
<html> <head> <title>mysite</title> </head> <body> <p><a href="pages/contacts.htm">text</a></p> <p><a href="pages/newsletter.htm">text</a></p> <p><a href="pages/faq.htm">text</a></p> </body> </html>
See also HTML::Element.

In reply to Re^3: Need direction on mass find/replacement in HTML files. by wfsp
in thread Need direction on mass find/replacement in HTML files. by kevin4truth

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.