#!/usr/bin/perl use warnings; use strict; use File::Find::Rule; #find all html files in specified directory my $dir = "/var/www/site/htdocs/"; my $rule = File::Find::Rule->file->name("*.html")->start( $dir ); #keep track of the changed files in a file open(OUTFILE,">>fixed_files.txt") || die "cant open fixed_files.txt, $ +!\n"; while ( my $html_file = $rule->match ) { rename($html_file, "$html_file.bak") or die; open(my $fh_in, '<', "$html_file.bak") or die; open(my $fh_out, '>', $html_file) or die; while (<$fh_in>) { #add the urchin code if (s|</head>|<script src="http://mysite.org/__utm.js" typ +e="text/javascript"></script>\n</head>|i) { print OUTFILE "$html_file: fixed Urchin code\n"; } print $fh_out $_; } close($fh_in); close($fh_out); } close OUTFILE;
Which worked like a charm. Unfortunately I came to find that I had a version control issue and some files had been updated with the code already and I didn't know it. So I ended up with a lot of pages with this:
<script src="http://mysite.org/__utm.js" type="text/javascript"></scri +pt> <script src="http://mysite.org/__utm.js" type="text/javascript"></scri +pt> </head>
I am trying to figure out how to remove the duplicate line from the files. I've tried many regexes with no success. My last idea was
#fix the urchin code if (s|<script src="http://mysite.org/__utm.js" type="text/ +javascript"></script>\n<script src="http://mysite.org/__utm.js" type= +"text/javascript"></script>|<script src="http://mysite.org/__utm.js" +type="text/javascript"></script>\n|i) { print OUTFILE "$html_file: fixed Urchin code\n"; }
Can someone point me in a more productive direction?
Thanks!
In reply to Regex to undo a regex? by hmbscully
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |