Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've been given the task of moving a (somewhat) poorly written website from an IIS to an apache server. Unfortunately, the IIS server was case insensitive and creator of the website was somewhat inconsistent in their use of capitals.

What I want to do is change all the filenames to lower case and then make sure any links point to the new filename. I've managed the first part, but I need some help on the second. Can anyone offer any suggestions?

Thanks, Hadley

(my code so far...)
use File::Find; find(\&wanted, 'y:\health\biru'); sub wanted { if ($_ ne "." && lc() ne $_) { print("Renaming $_ to " . lc() . "\n"); rename($_, lc() . "_______") or die("failed: $!"); rename(lc() . "_______", lc()) or die("failed: $!"); } if (/\.html$/) { //change any href="XXX" or src="XXX" to href="xxx" etc. ...

Replies are listed 'Best First'.
(cLive ;-) Re: Converting files and links to lowercase
by cLive ;-) (Prior) on Jan 29, 2002 at 09:24 UTC
    You might want to try one of the modules that parses html, or you could try...
    # read file into $file $file =~ s/href\s*=\s*("|')(.*?)\1/'href="'.lc($2).'"'/ges; $file =~ s/src\s*=\s*("|')(.*?)\1/'src="'.lc($2).'"'/ges; # write file here

    It's a little complicated, this regex, and I'm too tired to explain it all (sorry), but you can use the Camel or docs to fill in the blanks...

    cLive ;-)

    ps - it assumes that all URL's etc are well formed. For safety, you might want to use this instead:

    $file =~ s/href\s*=\s*("|')([^>]*?)\1/'href="'.lc($2).'"'/ges; $file =~ s/src\s*=\s*("|')([^>]*?)\1/'src="'.lc($2).'"'/ges;
      You can do the lower casing inline like:
      my $x = "The quick brown fox etc..."; $x =~ s/\b(\w)(\w+)\b/\U$1\L$2/g; print $x, "\n";

      Though I'm not sure if this looks more or less like line noise :)

      gav^

      Thanks for the regexp solutions—but I think I'll be giving the module solution a go first.

      Hadley

Re: Converting files and links to lowercase
by grep (Monsignor) on Jan 29, 2002 at 09:25 UTC
    Watch you precedence -> if ($_ ne "." && lc() ne $_).

    This may be your problem (if not it is more clearly written as):.

     if (($_ ne ".") && (lc() ne $_)).

    or even better:.

    if ($_ ne "." and lc() ne $_).

    If would also move the directory check to a guard clause.. (which you are only checking for current dir and not the parent (..))

    if (lc() ne $_) { next if m#^\.|\.\.$#;


    Update: cLive ;-) has some great advice about using modules

    grep
    grep> chown linux:users /world

      Thanks for the tip. Will do.

      Hadley

Re: Converting files and links to lowercase
by gav^ (Curate) on Jan 29, 2002 at 09:39 UTC
    Something like this should do the trick:
    sub wanted { rename($_, lc) or die "$_: $!" if !-d && /[A-Z]/; }
    To change the links is a bit tricker. I'd recommend using something like HTML::TreeBuilder and code like:
    my $tree = HTML::TreeBuilder->new_from_file($fn); foreach my $a ($tree->lookdown('_tag', 'a')) { $a->attr('href') = lc($a->attr('href')); } open OUT, ">$fn" or die "Can't overwrite $fn ($!)"; print OUT $tree->as_HTML; close OUT; $tree->delete;

    Hope this helps.

    gav^

      I did try using rename($_, lc) but it didn't work because Novell doesn't like renaming files to the same name with different case.

      Thanks for the HTML:Treebuilder suggestion - I'll definitely give it a go (and avoid problems caused by my lackluster regexp skills :)

      Hadley

Re: Converting files and links to lowercase
by seattlejohn (Deacon) on Jan 31, 2002 at 06:39 UTC
    While I agree that parsing and rewriting HTML is probably the right approach, for the sake of completeness I thought I'd mention that Apache has a mod_speling (sic) module that can make the server map URLs to filenames in a case-insensitive way. Obviously not an ideal solution (if nothing else, it's likely to dramatically increase server load)... but may be useful as a stopgap or to catch any links that fall through the cracks of your fix-'em-up routine. More info here:
    http://httpd.apache.org/docs/urlmapping.html

    Good luck!