The reason you are getting the duplicates is because you print the $newline to the OUTFILE once you make the modification and then you print $_ to the OUTFILE regardless of whether there was a match or not. There are 2 courses of action I'd consider. One:
In place of the grep for the href and .html, use a regex and s/// combo such as:
if (m/a\shref/i) { s/.html/.asp/i }
Your other option if you prefer to keep the grep that you are using now is to use a 'continue' at the end of the if block. IE:
if ( grep(/a href.*\.html/,$line) ){
(my $newline = $line ) =~ s/\.html/\.asp/g;
print OUTFILE $newline . "\n";
continue;
}
WARNING: Solution 2 is untested but I don't see anything standing immediately in the way of it working.
UPDATE: In solution 2 change the 'continue' function to the 'next' function - don't know what I was thinking at the time. I apologize.
-Adam Stanley
Nethosters, Inc.