apotheon has asked for the wisdom of the Perl Monks concerning the following question:
I've been poking around through manpages, perldoc, Perlmonks Super Search, and Google, and I haven't yet found an answer to this problem: Getopt::Std and Getopt::Long don't seem to play well with escape characters inserted into regular expressions. Do these Getopt modules automatically "sanitize" escape characters? Assuming that's so (and it certainly seems to be), is there some way to route around that if I want it to stop sanitizing them?
This came up because of a throw-away script I wrote for a coworker yesterday to replace double-newlines with single-newlines in a file. I did it the obvious way: I had it open the file by way of a filehandle, gyrated my way to dumping its contents into a scalar, and hard-coded a newline-reducing regex substitution ($foo =~ s/\n\n/\n/g).
I then decided that, like all Perl hack(er)s, I needed to have a wholly redundant text replacement utility of my very own, so I expanded upon the script's functionality by making it take CLI arguments (by way of Getopt::Std) and by adding help switch functionality to explain how it works for the next time I need it and can't remember. By the time I was done, it did everything I wanted except what I'd originally designed it to do: perform substitutions with newlines. From prototype to obsolete in less than an hour. Microsoft, eat your heart out.
Anyway, I refer you back to my first paragraph's questions, and hope for some help. Here's the relevant code:
#!/usr/bin/perl use strict; use Getopt::Std; my (%argument, @contents, $contents); getopts('hd:s:', \%argument); if ($argument{h}) { helptext() }else{ open(FILEHANDLE, "< $ARGV[0]") or die "cannot open file: $!"; $contents = do{local $/; <FILEHANDLE>;}; $contents =~ s/$argument{d}/$argument{s}/g; print $contents; close(FILEHANDLE); } sub helptext { print <<"EOT"; ===== syntax: frep [-h] -d <string> [-s <string>] <file> -h prints this help text and exits: invoking the help argument causes all other arguments to be discarded by this utility -d takes string as input, searches for that string: replaces with string from -s or with an empty string if no -s argument is specified -s takes string as input, substitutes it for string specified by the -d argument -- if not specified, text matching the -d argument will simply be deleted <file> specifies [path and] name of file to take as input, on whose contents this utility operates description: fts (aka "file text substitution") takes a file's contents as input and operates upon them, doing a simple find and replace operation globally throughout the file. The results are dumped to STDOUT, so the original file is untouched. If you want the original file to be overwritten with the new contents, use a shell redirect. bugs: Unfortunately, for reasons that are still a mystery to me, the -s argument does not handle newline escape characters (specifically, "\\n") properly. Such escape characters are "sanitized" by the Getopt::Std module. I have discovered that the Getopt::Long module seems to exhibit the same behavior. Maybe someday I'll bother to fix this. credits: Chad L. Perrin (author) Perlmonks community (contributors) license: This utility released under CCD CopyWrite. See following URL for details. http://ccd.apotheon.org EOT }
That's it. Oh, yeah, and if anyone has any suggestions, comments, criticisms, complaints, or flames relating to the code itself, even if they don't answer my actual questions here, I'd love the feedback.
NOTE: The shell is not the (only) problem, here. Yes, it sanitizes unquoted escape characters, but it does not sanitize quoted escape characters. The script, on the other hand, does sanitize quoted escape characters, which leads me back to the original problem.
|
- apotheon
CopyWrite Chad Perrin |
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Getopt, regexen, and newlines
by Roy Johnson (Monsignor) on Oct 12, 2005 at 18:23 UTC | |
by apotheon (Deacon) on Oct 12, 2005 at 18:40 UTC | |
by Roy Johnson (Monsignor) on Oct 12, 2005 at 18:54 UTC | |
by apotheon (Deacon) on Oct 12, 2005 at 19:02 UTC | |
|
Re: Getopt, regexen, and newlines
by Corion (Patriarch) on Oct 12, 2005 at 18:19 UTC | |
by apotheon (Deacon) on Oct 12, 2005 at 18:38 UTC | |
by Roy Johnson (Monsignor) on Oct 12, 2005 at 18:58 UTC | |
|
Re: Getopt, regexen, and newlines
by JediWizard (Deacon) on Oct 12, 2005 at 19:32 UTC | |
|
Re: Getopt, regexen, and newlines
by parv (Parson) on Oct 12, 2005 at 20:25 UTC | |
|
Re: Getopt, regexen, and newlines
by apotheon (Deacon) on Oct 12, 2005 at 23:02 UTC | |
by parv (Parson) on Oct 13, 2005 at 03:31 UTC | |
by apotheon (Deacon) on Dec 05, 2005 at 21:13 UTC |