hmbscully has asked for the wisdom of the Perl Monks concerning the following question:
#!/usr/bin/perl use warnings; use strict; use File::Find::Rule; #find all html files in specified directory my $dir = "/var/www/site/htdocs/"; my $rule = File::Find::Rule->file->name("*.html")->start( $dir ); #keep track of the changed files in a file open(OUTFILE,">>fixed_files.txt") || die "cant open fixed_files.txt, $ +!\n"; while ( my $html_file = $rule->match ) { rename($html_file, "$html_file.bak") or die; open(my $fh_in, '<', "$html_file.bak") or die; open(my $fh_out, '>', $html_file) or die; while (<$fh_in>) { #add the urchin code if (s|</head>|<script src="http://mysite.org/__utm.js" typ +e="text/javascript"></script>\n</head>|i) { print OUTFILE "$html_file: fixed Urchin code\n"; } print $fh_out $_; } close($fh_in); close($fh_out); } close OUTFILE;
Which worked like a charm. Unfortunately I came to find that I had a version control issue and some files had been updated with the code already and I didn't know it. So I ended up with a lot of pages with this:
<script src="http://mysite.org/__utm.js" type="text/javascript"></scri +pt> <script src="http://mysite.org/__utm.js" type="text/javascript"></scri +pt> </head>
I am trying to figure out how to remove the duplicate line from the files. I've tried many regexes with no success. My last idea was
#fix the urchin code if (s|<script src="http://mysite.org/__utm.js" type="text/ +javascript"></script>\n<script src="http://mysite.org/__utm.js" type= +"text/javascript"></script>|<script src="http://mysite.org/__utm.js" +type="text/javascript"></script>\n|i) { print OUTFILE "$html_file: fixed Urchin code\n"; }
Can someone point me in a more productive direction?
Thanks!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex to undo a regex?
by johngg (Canon) on Feb 02, 2008 at 00:36 UTC | |
|
Re: Regex to undo a regex?
by hipowls (Curate) on Feb 02, 2008 at 00:24 UTC | |
|
Re: Regex to undo a regex?
by shmem (Chancellor) on Feb 02, 2008 at 00:27 UTC |