Well, clearly, as you already write

$lines =~ s{http://www.abc.com}{http://www.test.com} does not work because it is too generic

options:

1. You pause at each substitution and ask if it should be replaced. You cache the answer, so you only ask once per URL. Takes a while.

2. You dump all found URL's into a single, sorted file, then peruse it. Find things that need to stay the same (blacklist), and things that should be changed (whitelist). What falls in between, you use the $ans=<STDIN> trick to interactively change

Samplecode for 1:

#!/usr/bin/perl my %YES; my %NO; $a='pat http://www.abc.com/test.gif ma http://www.abc.com/hello.html h +ttp://www.abc.com/test.gif '; $a=~s{(http://[\w\.\-\?\&\;\#\/]+)}{&ask($1)}gexi; sub ask{ my($url) =@_; return $url unless index($url,'www.abc.com'); # add more "return $url if condition;" here (blacklist) if($YES{$url}){ $url =~ s/www.abc.com/www.test.com/; return $url; }elsif($NO{$url}){ return $url; }else{ print "substitute $url ?"; $ans = <STDIN>; if($ans =~ m/y/i){ ++$YES{$url}; }else{ ++$NO{$url}; } return ask($url); } }

3: You already know what you will replace, and it does not match other things,

use File::Slurp; use warnings; use strict; my %PATTERNS =( 'http://www.abc.com/test\b' => 'http://www.test.com/twist', 'http://www.abc.com/(?:test[\d])\b' => 'http://www.test.com/', ); # patterns to regexps my @REGEXPS = map { qr/$_/ } keys %PATTERNS; # read from commandline die "usage: $0 <filenames> ...\n" unless @ARGV; for my $filename (@ARGV){ die "NOT A FILE! '$filename' " unless -f $filename; die "NOT READABLE! '$filename' " unless -r $filename; # read in a whole file into an array of lines my $lines = read_file( $filename ); my $changes = 0; for my $r (@REGEXPS){ if($lines =~ $r){ $changes++; last; } } if($changes == 0){ print "no changes for $filename\n"; exit 0; } rename $filename, $filename . ".bak"; my ($r,$s); for $r (keys %PATTERNS){ $s = $PATTERNS{$r}; $lines =~s/$r/$s/gei; } # write out a whole file write_file( $filename, $lines ); print "Modified $filename\n"; }

4. You take the url, get the new page, if it exists, it needs to be renamed. (curl -I fetches only the headers, and not the content, there you search for the "200 OK")

$result = `curl -I "$url"`; if($result=~m{HTTP/1.1 200 OK}){ # proceed to rename }

5. Lots of more options, tired now.


In reply to Re: URL search and replace in the files by FreeBeerReekingMonk
in thread URL search and replace in the files by pavan474

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.