hmbscully has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I am migrating a previously static website of nearly 4,000 .html pages into .php. I need to change all of the include calls from SSI to PHP.
For example
<!--#include virtual="/ssi/edfooter.txt"-->
to
<?php include($_SERVER['DOCUMENT_ROOT'].'/ssi/edfooter.txt'); ?>
The included file paths can remain the same, just what's wrapped around the paths needs to change. There are some pages that will have more than one include to be updated.

I understand that a command line solution for this will be suggested, I was unsuccessful when trying to get the regex to not throw errors for unescaped characters when trying a command line solution. I'd rather have the code in script form. I think what I am doing now is one semi-correct way of doing it, but I'm never entirely sure.

What I'm not doing right is editing the file, either in place or with a temporary file. I keep either wiping out the file completely or not doing anything to it. This isn't a script that is going to be called on any regular basis, so I'm not worried about it being efficient in the long-term, though I would like to understand some of the better ways of doing this.
Thanks!

#!/usr/bin/perl use warnings; use strict; use File::Find::Rule; #find all html files in specified directory #this is for a specific directory right now for testing, but #will eventually be going through all the subdirectories under /htdocs my $dir = "/home/devcorp/htdocs/plan/norms"; my $rule = File::Find::Rule->file->name("*.html")->start( $dir ); #keep track of the changed files in a file open(OUTFILE,">changed_files.txt") || die "cant open changed_files.txt +, $!\n"; while ( my $html_file = $rule->match ) { #open file to replace string in open FILE, "<$html_file"; my @lines = <FILE>; for (@lines) { #replace <!--#include virtual="[document path]"--> #with <?php include($_SERVER['DOCUMENT_ROOT'].'[document path] +'); ?> if (s/<!--#include virtual="(.*)"-->/<?php include(\$\_SER +VER['DOCUMENT_ROOT'].'$1');?>/){ my $result = $1; #print the file changed and the document path for the inc +luded file print OUTFILE "$html_file: $result\n"; } } close FILE; } close OUTFILE;

Output returned in changed_files.txt is
/home/devcorp/htdocs/plan/norms/index.html: /ssi/edfooter.txt
but obviously nothing is changed in the file itself because I'm not doing that part right.


I learn more and more about less and less until eventually I know everything about nothing.

Replies are listed 'Best First'.
Re: modify file in place in script? Regex for changing includes from SSI to PHP
by ikegami (Patriarch) on Oct 26, 2007 at 18:32 UTC

    Using Perl to create PHP. I love it! :)

    It's not that you're "not doing that part right".
    It's simply that you're "not doing that part".
    You never write out your changes.

    while ( my $html_file = $rule->match ) { # Change file in-place. local @ARGV = $html_file; local $^I = '.bak'; # Or '' to avoid making backups. while (<>) { ... # Keep (possibly edited) line print; } }

    If you want something less magical,

    while ( my $html_file = $rule->match ) { rename($html_file, "$html_file.bak") or die; open(my $fh_in, '<', "$html_file.bak") or die; open(my $fh_out, '>', $html_file) or die; while (<$fh_in>) { ... # Keep (possibly edited) line print $fh_in $_; } }
      Let me see how much I misunderstand the "less magical" code:

      rename($html_file, "$html_file.bak")    or die;
      Makes a copy of the file with the .bak extension

      open(my $fh_in,  '<', "$html_file.bak") or die;
      Opens the copy of the file for reading

      open(my $fh_out, '>', $html_file) or die;
      Opens the original file for writing, wiping out the contents?

          while (<$fh_in>) {
      while there are file contents in the backup file, do something with them

      # Keep (possibly edited) line print $fh_in $_; }

      Print whatever the backup file is back into the backup file?

      Ok, yeah, I still don't get this. I know I should, I've read enough examples, but I don't.
      I read it as making a copy of the file I want to change, wiping out the contents of the original, modifying the copied file somehow and writing the modified text back into the copied file? Why am I not seeing this still?

      Also, I don't get the while(<$fh_in>) { s/// } instead of using

      my @lines = <$fh_in>; for (@lines) { s/// }
      because do I want to do the s/// on the entire file at once? Don't I want to go line by line?

      As for the using Perl to write PHP, as ignorant as I may seem about Perl, I am continuously frustrated with moving into PHP. What I find is that the things that PHP does easier than Perl does not outweight the things that I could do easily in Perl that cannot do easily in PHP.


      I learn more and more about less and less until eventually I know everything about nothing.
        • rename renames, not copies.

        • Oops! My error. I did
          print $fh_in $_;
          where I meant to do
          print $fh_out $_;

        • Both
          my @lines = <$fh_in>; for (@lines) { s/// }
          and
          while(<$fh_in>) { s/// }
          work a line at a time. The only difference is that the top version needlessly keeps the entire file in memory.

          Entire file at once would be

          my $text; { local $/; $text = <$fh_in>; } $text =~ s///g;
        Ok, I studied the code some more and tested and tried and realized the < and > were reversed. This is what I've got now and it seems to work and I think I get it:

        while ( my $html_file = $rule->match ) { rename($html_file, "$html_file.bak") or die; open(my $fh_in, '<', "$html_file.bak") or die; open(my $fh_out, '>', $html_file) or die; while (<$fh_in>) { my @lines = <$fh_in>; for (@lines) { #replace <!--#include virtual="[document path]"--> #with <?php include($_SERVER['DOCUMENT_ROOT'].'[document p +ath]'); ?> if (s/<!--#include virtual="(.*)"-->/<?php include(\$\ +_SERVER['DOCUMENT_ROOT'].'$1');?>/){ my $result = $1; #print the file changed and the document path for the + included file print OUTFILE "$html_file: $result\n"; } print $fh_out $_; } close($fh_in); close($fh_out); }

        I learn more and more about less and less until eventually I know everything about nothing.
Re: modify file in place in script? Regex for changing includes from SSI to PHP
by tuxz0r (Pilgrim) on Oct 26, 2007 at 19:33 UTC
    I know you mentioned there would be a command line version, and so here it is. I have however put it in a shell script, which keeps it in a file you can then put into your project or a local bin directory and run again when needed. This worked on a couple of examples I threw together and without any un escaped character errors. As mentioned by previous comments, your code is fine with the noted changes, but I still thought you might be interested in a smaller command line solution as well.
    #!/bin/sh for file in `find ./ -name '*.html'`; do perl -i.bak -ane "s/<!--#include virtual=\"(.*)\" *-->/<?php include(\ +\$\_SERVER['DOCUMENT_ROOT'].'\$1');?>/g; print;" $file done

    ---
    echo S 1 [ Y V U | perl -ane 'print reverse map { $_ = chr(ord($_)-1) } @F;'

Re: modify file in place with perl -i
by Anonymous Monk on Oct 26, 2007 at 22:19 UTC
    #!/usr/bin/perl -w -i.bak -p s/<!--#include virtual="(.*)"-->/<?php include(\$_SERVER[\'DOCUMENT_RO +OT\'].\'$1'); ?>/g;
    Save that and run thescript.pl *.html or find . -iname \*html | xargs thescript.pl That will do the dirty work, but it wont print out the file names for changed files. As for your original script, I don't see where you are printing out the changed lines.
Re: modify file in place in script? Regex for changing includes from SSI to PHP
by Your Mother (Archbishop) on Oct 27, 2007 at 03:52 UTC

    One more, direct on command line. Use caution with these. Use them on a test copy if you can. They can cause accidents that can be hard to undo or end up snowballing with the smallest mistake.

    # as one line find . -regex '.*\.html$' -exec perl -pi.bk -e 's{<!--#include virtual +="/ssi/([^"]+)"-->} {<?php\ninclude(\$_SERVER['DOCUMENT_ROOT'].'/ssi/ +\1.txt');\n?>}g' {} \;