Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Perl Rename

by Mad_Mac (Beadle)
on Jul 03, 2010 at 08:41 UTC ( #847883=perlquestion: print w/replies, xml ) Need Help??

Mad_Mac has asked for the wisdom of the Perl Monks concerning the following question:

I'm still fairly new to Perl. I'm working through some basics to improve my understanding. I tried to use Larry Wall's Perl rename script, but keep getting an error.

#!/usr/bin/perl -w $op = shift or die "Usage: rename expr [files]\n"; chomp(@ARGV = <STDIN>) unless @ARGV; for ( @ARGV ) { $was = $_; eval $op; die $@ if $@; rename ( $was, $_ ) unless $was eq $_; }

I have a directory of file titled something like "Blah Blah - 01 - Peter.txt" and "Blah Blah - 02 - Lois.txt", etc. I want to strip off the "Blah Blah - 01 - " part, so I tried the following commands:

perl 's/Blah Blah - \d{2} - //' * perl 's/^Blah Blah - \d{2} - //' *

and a couple of other variations. Each time I get the same error:

Can't find string terminator "'" anywhere before EOF at (eval 1) line 1

Am I doing something really stupid here? Did I copy the code wrong? Is something wrong with my regex?

I'm using Strawberry Perl 5.10.1 on W7 x64.


Replies are listed 'Best First'.
Re: Perl Rename
by BrowserUk (Patriarch) on Jul 03, 2010 at 08:59 UTC

    Try switching the 's for "s.

      Good idea. I should have thought of that. It get's rid of my error, but the test files aren't getting renamed.

      Hmmm ....

        Works for me if I invoke like this:

        rnm "s/Blah Blah - \d{2} - //" "Blah Blah - 01 - Peter.txt" "Blah Blah + - 02 - Lois.txt"

        Of course, that requires I supply the names of the files to be renamed because cmd.exe doesn't expand wildcards for you. If you want that to happen, you'll have to modify the script a little:

        #!/usr/bin/perl -w $op = shift or die "Usage: rename expr [files]\n"; chomp( @ARGV ); @ARGV = map glob, @ARGV; for ( @ARGV ) { $was = $_; eval $op; die $@ if $@; rename ( $was, $_ ) unless $was eq $_; }

        Then rnm "s/Blah Blah - \d{2} - //" *.txt works.

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Perl Rename
by Khen1950fx (Canon) on Jul 03, 2010 at 10:37 UTC
    Here's another way to do it:
    #!/usr/bin/perl use strict; use warnings; use File::DosGlob; my @ARGV = map { my @g = File::DosGlob::glob($_) if /[*?]/; @g ? @g : $_; } @ARGV; my $op = shift or die "Usage: rename expr [files]\n"; chomp( @ARGV = <STDIN>) unless @ARGV; for (@ARGV) { my $was = $_; eval $op; die $@ if $@; rename( $was, $_ ) unless $was eq $_ }

      Thanks! That works, but ... the point of my question was to not to find functional code, it was to improve my understanding. So, with that in mind, let me ask a follow-on question:

      It seems from the extra code you added at the beginning, that my problem was difference in the way glob works on Windows compared to *nix?

      If that's the case, I only have a vague idea of what the additional code does. I am somewhat familiar with the map function, having used it to construct some SQL statements based on array values. Unfortunately, I don't follow exactly what's going on here. Could you explain it to me?

      Thanks for the help.

        You've hit a much deeper problem here:

        Passing Arguments

        On Systems derived from Unix (Linux, *BSD, and nearly every operating system with an X in its name, except for Windows XP), a program is invoked with an (ordered) list of arguments. This list was called the argument vector (from the mathematical term "vector" as a one-dimensional array of values), or short argv in C or @ARGV in perl. There is a minor difference between argv and @ARGV: The first element of argv is the name of the invoked program, the remaining elements are the "real" arguments. Perl's @ARGV contains only the "real" arguments, the program name is in $0, the result of a simple shift operation happening long before your code starts.

        On Systems derived from DOS (MS-DOS, DR-DOS, FreeDOS, all versions of Windows, OS/2, and nearly every operating system with "DOS" in its name), things are different: A program is invoked with a single argument string. In old times, it was limited to as few as 126 bytes, but I think the new limit is way larger with the Windows API. There originally was no way to know the name that was used to invoke the program. Later DOS versions (starting at 4.01 or so) stored that information in a well-known location in memory, Windows has an API function to get that information.

        The Shells

        Imagine something simple like perl *.secret typed into the shell:

        On DOS systems (including all Windows versions), the shell searches for perl.bat,, perl.exe (and a few more extensions on NT-based Windows versions), and invokes the first found, giving it a command line string of *.secret. That's all.

        On Unix systems, the shell searches for a file named perl that has at least one of the executable bits (x in ls -l output) set. Then, it expands all arguments. The string contains no placeholders, so it is not modified. *.secret is replaced with a list of all files in the current directory whose names end with .secret if such files exist, else it is passed as is.

        Runtime Library

        On Unix systems, nothing special happens. After some housekeeping for the runtime, control is passed to main().

        On DOS systems, the runtime splits the command line string into elements, and some rare implementations also expand the elements. Both splitting and expanding depend on the runtime library implementation, there is no formal specification. Most split on space and tabs, and if they expand, they use rules similar to those the shell uses for its build-in commands. Finally, control is passed to main(), with the splitted and sometimes expanded command line string passed in argv. Depending on the compiler and the compiler flags used, it is also possible that the WinMain() function is invoked instead, without any processing of command line arguments, leaving that to the application. In both cases, the application can get the original command line string and parse it independant from the runtime.


        Perl uses the main() interface, so it expects its arguments splitted into argv.

        On both systems, the Perl interpreter starts, consideres argv, initialises $0 and @ARGV from argv, loads from the current directory, and runs it.

        On Unix, @ARGV is either a one-element array containing *.secret, this happens when the shell could not expand *.secret, because there were no *.secret files. Or @ARGV is an array containing one or more file names, all ending with .secret. Note that the first case highly depends on the shell, and it would be ok for a shell to pass no value at all in this case.

        On DOS, @ARGV is always a one-element array containing *.secret, simply because no code expanded it.

        The big difference here is that on Unix, the shell is responsible for parsing and expanding user input, whereas on DOS, the shell parses the user input only partially, and passes the remaining input (nearly) unmodified to the application. It is the job of each single application to parse and expand user input. This causes lots of trouble, because the parsing is not predictable. It depends on the runtime library and the application itself.


        One workaround is to use Win32::Autoglob, which takes care of expanding @ARGV automatically only on Windows and not when the cygwin perl is used (cygwin expands the arguments). Unlike most other modules from the Win32:: namespace, it should run with any perl 5 on any operating system. Note that Win32::Autoglob does not work on DOS and OS/2 ports of perl, simply because the author forgot to check ($^O eq 'dos') or ($^O eq 'os2').

        Simply calling glob for every element of @ARGV that contains * or ? is a bad idea, especially when it is done independant from the operating system. In the best case, you are only wasting time, in the worst case, you are damaging your arguments. Remember that Unix shells may also pass * and ? literally when the user told them to do so, like in perl -e 'print join " ",@ARGV' \* \<-- this is a star, isn\'t it\?


        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        I'm new to Perl myself, so this explanation might not only be insufficient but also wrong. That mandatory warning aside..

        Map takes a list and returns a list. The best way I've found to understand map, is to remember that you have to read the list argument backwards; that is, the list it modifies goes at the end, and the list you get, at the start. In this case, both lists are @ARGV, which might complicate your understanding somewhat. Basically, it takes each element in @ARGV, transforms them with whatever it's in the block (in this case), and returns the transformed elements as a new list to be set in @ARGV again.

        So, let's take a look inside the block. I've never used File::DosGlob, but from skimming over the documentation, I assume it ports the glob function from *nix to a DOS shell/cmd. So we are using it to expand the wildcard, returning a list of elements that match.


        my @g = File::DosGlob::glob($_) if /[*?]/;

        Checks whenever $_ has a wildcard in it, and if it does, calls the DosGlob glob function, which returns a list of files to be set in @g.

        The last line of the code tells the block which values send over to the new list being generated by map, like in your usual function. Here it's a ternary operator; if @g is true (has nonzero elements), evaluates @g, and thus sends that to map. If @g is false (there were no wildcards in $_), it evaluates whatever is after the :, which is $_, and sends that to the map.

        Here's hoping this helps/I didn't massively screw up.

        There are several+ flavors of "glob" (various DOS globbers, BSD glob, POSIX glob) and there are differences and that sometimes matters a lot! For example some DOS globbers don't work with filenames with spaces in them!

        Internally within my code, I don't use glob, preferring to open a directory and use grep{} with a regex to filter out what I want as shown below (note: file test operators can also be added to the grep, etc). This method is multi-platform and works with Perl 5.6 (even Win 98 ports). Independent of command line expansion issues (there is a good post about that already), this sort of thing happens all the time, e.g. I want to know the .backup files for my application in directory X.

        Basically use glob() only when forced to do so.

        #!/usr/bin/perl -w use strict; #print all files ending in .pl within C:/Temp directory my $dir = 'C:/Temp'; opendir (DIR, $dir) || die "couldn't open $dir"; my @files = grep {/\.pl$/}readdir DIR; print "@files";

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://847883]
Approved by Corion
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2022-12-05 21:59 GMT
Find Nodes?
    Voting Booth?

    No recent polls found