in reply to Re: Re: Re: Changing .html to .asp
in thread Changing .html to .asp

Jim, thanks a lot for your help. What I am doing is content migration from .htmls to .asps. Im only grabbing everything thats within the body tags and copying it over to the new asp in the same directory/folder. Within the body tag content I want to change any link that has .html to .asp. Your code does that for me, but it prints the line out twice. Once with the .html extension and once with the .asp. I only want it to come back with the .asp. Do you know how I can get rid of the duplicate .html? Thanks
open (FILE, $filename); while(<FILE>){ # walk each file my $line = $_ ; chomp $line; #grabbing and printing everything between the body tags if (/<body.*?>/i ... /<\/body.*?>/i){ # this is a body line # extract the body ##changing .html to .asp in the links if ( grep(/a href.*\.html/,$line) ){ (my $newline = $line ) =~ s/\.html/\.asp/g; print OUTFILE $newline . "\n"; } $body_temp = $_; $body_temp =~ s/(.*?)\<body\>(.*?)\<\/body\>/$2/i; chomp($body_temp); $body = "$body_temp" ; # Write the body to the output file print OUTFILE $body . "\n"; } } close(FILE);

Edit by tye

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Changing .html to .asp
by snafu (Chaplain) on May 01, 2001 at 00:49 UTC
    Hmm. Odd, I went ahead and double checked it (my code) on my box and I can't seem to reproduce what you are seeing. However, I did notice a lil annoyance that I went ahead and fixed. Before getting to that though I will show you what I see when I run the script on my box:

    Do you mind pasting a screenshot of what you see with your output?

    Now, for the annoyance. I noticed unneeded linefeeds getting in the fixed files. Easy to fix. Go to the lines where it does this:

    
         61             ( my $newline = $line ) =~ s/\.html/\.asp/g;
         62
         63             #print "Substitution written was: [[ ";
         64             #print "$newline";
         65             #print " ]]\n";
         66
         67             $changed = "1";
         68             $lcount++;
         69
         70 # append that to the new file.
         71             print NEW "$newline\n";
         72         } else {
         73 # otherwise just append the old line into
         74 # the file.
         75             print NEW "$_\n";
         76         }
    

    Remove the '\n's in the print statements. That fixed that lil problem for me. As for the code you are writing to supplement what I have done, unfortunately I don't have a lot of time to look at it right now being that I am at work and should be...he hem..well, working =P. I will check it closer tonight (later for me...gotta spend time with my family. I usually wait till my wife goes to sleep to play).

    Good luck. If you want, we can talk more about it in real time in irc /server irc.openprojects.net #perl or we can continue to do this.

    ----------
    - Jim

Re: Re: Re: Re: Re: Changing .html to .asp
by snafu (Chaplain) on May 01, 2001 at 10:45 UTC
    Ok. I found your problem. Here, let me show ya.
    ##changing .html to .asp in the links        
            if ( grep(/a href.*\.html/,$line) ){
                (my $newline = $line ) =~ s/\.html/\.asp/g;            
            print OUTFILE $newline . "\n";
            }
    
    Ok, that is fine. Your problem, however, is after this. Bear in mind that you are going through these files line by line. Therefore, the line that you are replacing the old line with must be placed in the new file instead of the old line, right? So, keeping your above code in mind, you have just found a line that matches what you are looking for and have changed it. You have also printed that line to the newfile. But what do you do next?
             $body_temp = $_;
             $body_temp =~ s/(.*?)\<body\>(.*?)\<\/body\>/$2/i;
             chomp($body_temp);
    
             $body = "$body_temp" ;
            
             # Write the body to the output file
             print OUTFILE $body . "\n";
            }
    
    ...amongst all the other stuff you done with the html body tag, you printed the line again, because you never left that line before processing it through the stuff after your if statement. You won't leave that line until the end of the loop iteration. Therefore, you should use an 'else' block in your 'if-then' statement.

    e.g.

    while ( <FILE> ) {
        if ( this line matches this regex ) {
            change the line;
            # If you need to do something to this line
            # do it here
            print it to the new file;
        } else {
            # this line obviously does not match my regex
            # so ignore it (or do some more stuff to it) 
            # and move on to the next line.
            Stuff to do to the line I didn't have to change...
            print the old line to the new file
        }
    }
    
    See what Im doing? :)

    ----------
    - Jim