Hmm. Odd, I went ahead and double checked it (my code) on my box and I can't seem to reproduce what you are seeing. However, I did notice a lil annoyance that I went ahead and fixed. Before getting to that though I will show you what I see when I run the script on my box:
Do you mind pasting a screenshot of what you see with your output?
Now, for the annoyance. I noticed unneeded linefeeds getting in the fixed files. Easy to fix. Go to the lines where it does this:
61 ( my $newline = $line ) =~ s/\.html/\.asp/g;
62
63 #print "Substitution written was: [[ ";
64 #print "$newline";
65 #print " ]]\n";
66
67 $changed = "1";
68 $lcount++;
69
70 # append that to the new file.
71 print NEW "$newline\n";
72 } else {
73 # otherwise just append the old line into
74 # the file.
75 print NEW "$_\n";
76 }
Remove the '\n's in the print statements. That fixed that lil problem for me. As for the code you are writing to supplement what I have done, unfortunately I don't have a lot of time to look at it right now being that I am at work and should be...he hem..well, working =P. I will check it closer tonight (later for me...gotta spend time with my family. I usually wait till my wife goes to sleep to play).
Good luck. If you want, we can talk more about it in real time in irc /server irc.openprojects.net #perl or we can continue to do this.
----------
- Jim | [reply] |
Ok. I found your problem. Here, let me show ya.
##changing .html to .asp in the links
if ( grep(/a href.*\.html/,$line) ){
(my $newline = $line ) =~ s/\.html/\.asp/g;
print OUTFILE $newline . "\n";
}
Ok, that is fine. Your problem, however, is after this. Bear in mind that you are going through these files line by line. Therefore, the line that you are replacing the old line with must be placed in the new file instead of the old line, right? So, keeping your above code in mind, you have just found a line that matches what you are looking for and have changed it. You have also printed that line to the newfile. But what do you do next?
$body_temp = $_;
$body_temp =~ s/(.*?)\<body\>(.*?)\<\/body\>/$2/i;
chomp($body_temp);
$body = "$body_temp" ;
# Write the body to the output file
print OUTFILE $body . "\n";
}
...amongst all the other stuff you done with the html body tag, you printed the line again, because you never left that line before processing it through the stuff after your if statement. You won't leave that line until the end of the loop iteration. Therefore, you should use an 'else' block in your 'if-then' statement.
e.g.
while ( <FILE> ) {
if ( this line matches this regex ) {
change the line;
# If you need to do something to this line
# do it here
print it to the new file;
} else {
# this line obviously does not match my regex
# so ignore it (or do some more stuff to it)
# and move on to the next line.
Stuff to do to the line I didn't have to change...
print the old line to the new file
}
}
See what Im doing? :)
----------
- Jim | [reply] |