I've been poking around through manpages, perldoc, Perlmonks Super Search, and Google, and I haven't yet found an answer to this problem: Getopt::Std and Getopt::Long don't seem to play well with escape characters inserted into regular expressions. Do these Getopt modules automatically "sanitize" escape characters? Assuming that's so (and it certainly seems to be), is there some way to route around that if I want it to stop sanitizing them?
This came up because of a throw-away script I wrote for a coworker yesterday to replace double-newlines with single-newlines in a file. I did it the obvious way: I had it open the file by way of a filehandle, gyrated my way to dumping its contents into a scalar, and hard-coded a newline-reducing regex substitution ($foo =~ s/\n\n/\n/g).
I then decided that, like all Perl hack(er)s, I needed to have a wholly redundant text replacement utility of my very own, so I expanded upon the script's functionality by making it take CLI arguments (by way of Getopt::Std) and by adding help switch functionality to explain how it works for the next time I need it and can't remember. By the time I was done, it did everything I wanted except what I'd originally designed it to do: perform substitutions with newlines. From prototype to obsolete in less than an hour. Microsoft, eat your heart out.
Anyway, I refer you back to my first paragraph's questions, and hope for some help. Here's the relevant code:
#!/usr/bin/perl use strict; use Getopt::Std; my (%argument, @contents, $contents); getopts('hd:s:', \%argument); if ($argument{h}) { helptext() }else{ open(FILEHANDLE, "< $ARGV[0]") or die "cannot open file: $!"; $contents = do{local $/; <FILEHANDLE>;}; $contents =~ s/$argument{d}/$argument{s}/g; print $contents; close(FILEHANDLE); } sub helptext { print <<"EOT"; ===== syntax: frep [-h] -d <string> [-s <string>] <file> -h prints this help text and exits: invoking the help argument causes all other arguments to be discarded by this utility -d takes string as input, searches for that string: replaces with string from -s or with an empty string if no -s argument is specified -s takes string as input, substitutes it for string specified by the -d argument -- if not specified, text matching the -d argument will simply be deleted <file> specifies [path and] name of file to take as input, on whose contents this utility operates description: fts (aka "file text substitution") takes a file's contents as input and operates upon them, doing a simple find and replace operation globally throughout the file. The results are dumped to STDOUT, so the original file is untouched. If you want the original file to be overwritten with the new contents, use a shell redirect. bugs: Unfortunately, for reasons that are still a mystery to me, the -s argument does not handle newline escape characters (specifically, "\\n") properly. Such escape characters are "sanitized" by the Getopt::Std module. I have discovered that the Getopt::Long module seems to exhibit the same behavior. Maybe someday I'll bother to fix this. credits: Chad L. Perrin (author) Perlmonks community (contributors) license: This utility released under CCD CopyWrite. See following URL for details. http://ccd.apotheon.org EOT }
That's it. Oh, yeah, and if anyone has any suggestions, comments, criticisms, complaints, or flames relating to the code itself, even if they don't answer my actual questions here, I'd love the feedback.
NOTE: The shell is not the (only) problem, here. Yes, it sanitizes unquoted escape characters, but it does not sanitize quoted escape characters. The script, on the other hand, does sanitize quoted escape characters, which leads me back to the original problem.
|
- apotheon
CopyWrite Chad Perrin |
In reply to Getopt, regexen, and newlines by apotheon
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |