Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am trying to edit all the files in a directory to linkify some plain text. Basically any name that has one or more underscores and parenthesis at the end I want to edit to create a link.

E.g.
name_of_func() becomes <a href="name_of_func.html">name_of_func()</a>

Here's my code so far, at the moment im just getting "The system cannot find the file specified.", in the console.

use strict; use warnings; opendir DIR, "."; foreach $file (readdir DIR) { open ( FILE, $file ) or die "Can't open $file: $!\n"; @lines = <FILE>; close FILE; # Open same file for writing, reusing STDOUT open (FILE, ">$file") or die "Can't open $file: $!\n"; # Walk through lines... foreach $line ( @lines ) { # if line contains x_x_x(), create link & print to file if ($line =~ //) { # change x_x_x() to <A HREF="x_x_x.html">x_x_x()</A> } # else print unchanged line to file else { print FILE $line; } } close FILE; } closedir( DIR );
I also need help with the pattern matching, im not sure how to find a name with one or more underscores.

Thanks for any help you can give, Thai

Replies are listed 'Best First'.
Re: Edit html files in directory
by clinton (Priest) on Aug 22, 2007 at 13:24 UTC
    You haven't specified all of the requirements for your regex to match, but this one-liner will work:
    On *nix: perl -pi -e 's{\b((?:[a-zA-Z0-9]+_)+[a-zA-Z0-9])+\(\)\b}{<a href=" +$1.html">$1()</a>}g' * On Windows (I think): perl -pi -e "s{\b((?:[a-zA-Z0-9]+_)+[a-zA-Z0-9]+\(\)\b}{<a href=\" +$1.html\">$1()</a>}g" *
    It will match any word (containing letters or numbers) with at least one underscore, and replace it with your link

    Explanation

    • The perl -pi -e CODE * will run CODE on all the lines of all the files in the current directory, editing them in-place (see perlrun)
    • The regex works as follows (see perlre):
      s{ # match and replace on $_ \b # start with a word boundary ( # capture the matches and store in + $1 (?:[a-zA-Z0-9]+_)+ # letters and numbers followed by +_ at least once [a-zA-Z0-9]+ # must end with letters/numbers ) \(\) # then () \b # followed by word boundary } { <a href="$1.html">$1()</a> # replace the matched text with th +is string } # substituting the value of $1 (th +e first capture) g # perform this replace on all matc +hes in $_

      Result

      abc abc() abc_ abc_() abc_def abc_def() abc_def_ abc_def_() abc_def_ghi abc_def_ghi() BECOMES: abc abc() abc_ abc_() <a href="abc_def.html">abc_def()</a> <a href="abc_def.html">abc_def()</a>() abc_def_ abc_def_() <a href="abc_def_ghi.html">abc_def_ghi()</a> <a href="abc_def_ghi.html">abc_def_ghi()</a>()
      Clint

      Update: missed the requirement that the original string should end with ()

Re: Edit html files in directory
by johnlawrence (Monk) on Aug 22, 2007 at 13:44 UTC
    Update: The above solutions are probably better, but I'm leaving this in the hope it helps clarify things.

    I think the problem that you were having was due to trying to open all the results from the directory list, including "." and ".."

    The code below, should fix that and I've added a quick stab at the regex you're after.

    #!/usr/bin/perl use strict; use warnings; opendir(DIR, "."); my @list = readdir(DIR); closedir(DIR); foreach my $file (@list){ #the line below makes sure you don't try to open . and .. if($file !~ /^\.+$/){ open(FILE,"$file"); my @lines = <FILE>; close FILE; # Open same file for writing, reusing STDOUT open (FILE, ">$file") or die "Can't open $file: $!\n"; # Walk through lines... foreach my $line ( @lines ) { #can just make the replacement on this line $line =~ s/(\w+_[_\w]+)\(\)/<A HREF="$1.html">$1\(\)<\/A>/; print FILE $line; } close FILE; } } closedir( DIR );
      I think johnlawrence is correct here, the problem is that you are trying to open everything returned by readdir(), which will include the current and parent directories, not just the plain files.

      It would be clearer, though, if you gave us the complete output. Can you tell which die() statement it failed on? (This is a good reason for making the message text from die() action specific -- if it dies trying to open a file for reading, say so. die "Can't open $file for reading: $!" and die "Can't open $file for writing: $!" make it a lot easier to tell just where it died.)

      Hi, thanks for your reply, you've been a great help.

      Ive noticed that the regex youve had a go at doesnt replace all instances on a line, only the first one...

      e.g.
      x_x(), y_y_y(), z_z()

      becomes:
      <a href="x_x.html">x_x()</a>, y_y_y(), z_z()

      How might i change this, use a * to apply to all instances?..

      s/(\w+_[_\w]+)\(\)/<A HREF="$1.html">$1\(\)<\/A>/*;

      Thanks again

        \w includes an underscore, so your regex \w_[_\w]+ will match ___ which probably isn't what he wants.

        Clint

        Ah...use g for global after the last delimiter :)
Re: Edit html files in directory
by Anonymous Monk on Aug 22, 2007 at 13:12 UTC
    Update...
    ive had a go at the string matching:
    /[a-z]*[_[a-z]*]*()/
      Basically any name that has one or more underscores and parenthesis at the end
      Whoops I misread your initial post.This line will match a word with at least one underscore followed by any character with paranthesis at the end of the line. Sorry about that.
      /.+\_.*\(\)$/

      Untested
        This will also match all sorts of $things that it shouldn't __()