comment on

Well, clearly, as you already write

$lines =~ s{http://www.abc.com}{http://www.test.com} does not work because it is too generic

options:

1. You pause at each substitution and ask if it should be replaced. You cache the answer, so you only ask once per URL. Takes a while.

2. You dump all found URL's into a single, sorted file, then peruse it. Find things that need to stay the same (blacklist), and things that should be changed (whitelist). What falls in between, you use the $ans=<STDIN> trick to interactively change

Samplecode for 1:

#!/usr/bin/perl

my %YES;
my %NO;

$a='pat http://www.abc.com/test.gif ma http://www.abc.com/hello.html h
+ttp://www.abc.com/test.gif ';

$a=~s{(http://[\w\.\-\?\&\;\#\/]+)}{&ask($1)}gexi;

sub ask{
    my($url) =@_;
    return $url unless index($url,'www.abc.com');
# add more "return $url if condition;" here (blacklist)
    if($YES{$url}){
        $url =~ s/www.abc.com/www.test.com/;
        return $url;
    }elsif($NO{$url}){
        return $url;
    }else{
        print  "substitute $url ?";
        $ans = <STDIN>;
        if($ans =~ m/y/i){
            ++$YES{$url};
        }else{
            ++$NO{$url};
        }
        return ask($url);
    }
}
[download]

3: You already know what you will replace, and it does not match other things,

use File::Slurp;
use warnings;
use strict;

my %PATTERNS =(
    'http://www.abc.com/test\b' => 'http://www.test.com/twist',
    'http://www.abc.com/(?:test[\d])\b' => 'http://www.test.com/',
);

# patterns to regexps
my @REGEXPS = map { qr/$_/ } keys %PATTERNS;


# read from commandline
die "usage: $0 <filenames> ...\n" unless @ARGV;

for my $filename (@ARGV){

    die "NOT A FILE! '$filename' " unless -f $filename;
    die "NOT READABLE! '$filename' " unless -r $filename;

    # read in a whole file into an array of lines 
    my $lines = read_file( $filename );

    my $changes = 0;

    for my $r (@REGEXPS){
        if($lines =~ $r){
            $changes++;
            last;
        }
    }

    if($changes == 0){
        print "no changes for $filename\n";
        exit 0;
    }

    rename $filename, $filename . ".bak";

    my ($r,$s);
    for $r (keys %PATTERNS){
        $s = $PATTERNS{$r};
        $lines =~s/$r/$s/gei;
    }

    # write out a whole file 
    write_file( $filename, $lines );

    print "Modified $filename\n";

}
[download]

4. You take the url, get the new page, if it exists, it needs to be renamed. (curl -I fetches only the headers, and not the content, there you search for the "200 OK")

$result = `curl -I "$url"`;
if($result=~m{HTTP/1.1 200 OK}){
 # proceed to rename
}
[download]

5. Lots of more options, tired now.

In reply to Re: URL search and replace in the files by FreeBeerReekingMonk
in thread URL search and replace in the files by pavan474

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.