dimar has asked for the wisdom of the Perl Monks concerning the following question:

The Background

While working on my little 'batch file rename' tool I came across a semi-roadblock. As I suspected, Super Search revealed that I wasn't the first person in the world to go down this path, and batch file rename is a well worn topic to say the least. Great! I was able to see how other people had approached the same problems I had encountered, as well as get some new ideas. Nevertheless, there were some problems not already addressed (AFAICT) in the previous posts on this topic.

The Problem

My script is actually working fine as-is, but I would like to enhance the functionality of my little renaming tool (the general spec for the tool is listed below). These areas need improvement:

The Question

1) Under "rename htm based on content" I need a way to specify in the filename whether an htm file has already been renamed based on the title tag, so as to let the script skip over htm files it has already renamed in the past. {fn1} What is a good way to do this?
2) Under "global-preview-before-rename" what is a good way to let the user "eyeball" the proposed rename operations before they are carried out? A GUI checklist would be nice, but simplicity is preferred. Is there a way to allow the user to individually 'select and deselect' from a list of proposed renames before the operation is carried out?

The general spec for my little batch rename tool ...
=for comment ### Definitions "ugly characters" ;; chars either not allowed or desired "too long" ;; maximum allowable length for a basename "basename" ;; file-full-path minus file-extension "file-extension" ;; anything that satisfies m/\.[^\.]+$/ "rename suprises" ;; unexpected results from a batch rename ### standard operations rename extension (eg *.foo to *.fee) rename basename (eg ibm*.* to IBM*.*) routine replace (eg lower-case to upper-case) ### file-management operations rename and move rename and copy rename to non-yet-existent location (mkpath-style) ### scrubbing operations remove all "ugly characters" replace all "ugly characters" with a safe char (eg "_") reduce all "too long" names to a truncated shorter name (and allow the user to specify truncate rules) (and prevent "rename collision" on non-unique truncations) ### safeguard operations avoid "rename collision" by halting the program avoid "rename collision" by appending UniqueID avoid "rename suprises" with global-preview-before-rename ### rename based on content rename htm file by the content of the <title> tag (and apply all scrubbing operations) (and skip the rename if already been done before) rename htm file by some UniqueID if no <title> tag exists ### rename based on context rename to include all directory-path-steps as part of the name rename to include the last 2 path-steps as part of the name =cut

The Footnote

{fn1} The current approach for renaming htm based on title tag is to grab the content of title tag (if it can be grabbed); scrub it; apply a 'counter tag' to the end to prevent renaming collisions; and use the counter tag (eg m/xx\d+/) as a 'search anchor' to test if the htm file was already renamed by this tool in the past.

  • Comment on Batch Rename Question (YABRQ): Rename on content and Global Preview
  • Download Code

Replies are listed 'Best First'.
Re: Batch Rename Question (YABRQ): Rename on content and Global Preview
by ikegami (Patriarch) on Nov 05, 2004 at 01:37 UTC
    Since we're dealing specifically with HTML documents, an answer to Question 1 would be to add a META tag indicating it has already been renamed. When the utility comes accross an HTML doc with that META tag, it won't try to rename it. A command line option could tell the utility to ignore META the tag.
Re: Batch Rename Question (YABRQ): Rename on content and Global Preview
by BrowserUk (Patriarch) on Nov 05, 2004 at 02:55 UTC

    On Q.2, if being prompted for each file individually doesn't grab you, you could present a screenful of files at a time (perhaps using one of the the columnising codes from 'ls -C' column style to maximise the screen usage), and prefix each with a number. Then prompt the user for a list of numbers (space delimited) of those files to be excluded:

    1)fileA 2)fileB 3)fileC 4)fileD 5)fileE 6)fileF 7)fileG 8)fileI exclude? 3 7 9)fileJ 10)fileK 11)fileL 12)fileM 13)fileM 14)fileN 15)fileO 16)fileP exclude?

    Depending on the sizes of your directories, filenames and screen you could probably get away with just one prompt in most cases.

    You could also allow a first value on the line of I or E (or X) to allow the user to type the shorter list.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re: Batch Rename Question (YABRQ): Rename on content and Global Preview
by gaal (Parson) on Nov 05, 2004 at 05:27 UTC
    How often will you be doing this kind of renaming? My favorite quick'n'dirty method for mass approval is to dump everything out to a file in the form of pairs (from, to), then edit the file with a text file -- deleting lines I don't want there -- then using the file to fire off the actual renames. This can be accomplished by simply inserting the string "mv " in the beginning of each line for simple cases.

    Not offering this as a programmatic tip, but as a pragmatic one for certain cases.

Re: Batch Rename Question (YABRQ): Rename on content and Global Preview
by ikegami (Patriarch) on Nov 05, 2004 at 01:42 UTC

    As for Question 2, the simplest solution is to ask rename yes/no for every file. If there are many files, however, it's very easy to say yes where you meant to say no.

    To avoid that problem, the utility could present the list of changes (in "from -> to" format) and ask the user if he wants to 1) rename all, 2) prompt for each change, 3) rename none. And maybe even 4) Specify which ones to rename (or not rename) by number.

Re: Batch Rename Question (YABRQ): Rename on content and Global Preview
by TedPride (Priest) on Nov 05, 2004 at 04:58 UTC
    Why not just add a log file that tells which files were renamed when? That way not only do you know which files have been processed, but if you want to go back through the processed files later, you can tell if the files have been updated since the last time they were renamed.