comment on

I've been poking around through manpages, perldoc, Perlmonks Super Search, and Google, and I haven't yet found an answer to this problem: Getopt::Std and Getopt::Long don't seem to play well with escape characters inserted into regular expressions. Do these Getopt modules automatically "sanitize" escape characters? Assuming that's so (and it certainly seems to be), is there some way to route around that if I want it to stop sanitizing them?

This came up because of a throw-away script I wrote for a coworker yesterday to replace double-newlines with single-newlines in a file. I did it the obvious way: I had it open the file by way of a filehandle, gyrated my way to dumping its contents into a scalar, and hard-coded a newline-reducing regex substitution ($foo =~ s/\n\n/\n/g).

I then decided that, like all Perl hack(er)s, I needed to have a wholly redundant text replacement utility of my very own, so I expanded upon the script's functionality by making it take CLI arguments (by way of Getopt::Std) and by adding help switch functionality to explain how it works for the next time I need it and can't remember. By the time I was done, it did everything I wanted except what I'd originally designed it to do: perform substitutions with newlines. From prototype to obsolete in less than an hour. Microsoft, eat your heart out.

Anyway, I refer you back to my first paragraph's questions, and hope for some help. Here's the relevant code:

#!/usr/bin/perl

use strict;
use Getopt::Std;

my (%argument, @contents, $contents);

getopts('hd:s:', \%argument);

if ($argument{h})
{
        helptext()
}else{
        open(FILEHANDLE, "< $ARGV[0]")
                or die "cannot open file: $!";
        $contents = do{local $/; <FILEHANDLE>;};
        $contents =~ s/$argument{d}/$argument{s}/g;
        print $contents;
        close(FILEHANDLE);
}

sub helptext
{
print <<"EOT";
=====

syntax:
frep [-h] -d <string> [-s <string>] <file>

-h
        prints this help text and exits: invoking the help
        argument causes all other arguments to be discarded
        by this utility

-d
        takes string as input, searches for that string:
        replaces with string from -s or with an empty string
        if no -s argument is specified

-s
        takes string as input, substitutes it for string
        specified by the -d argument -- if not specified,
        text matching the -d argument will simply be
        deleted

<file>
        specifies [path and] name of file to take as input,
        on whose contents this utility operates

description:  fts (aka "file text substitution") takes a
file's contents as input and operates upon them, doing a
simple find and replace operation globally throughout the
file.  The results are dumped to STDOUT, so the original
file is untouched.  If you want the original file to be
overwritten with the new contents, use a shell redirect.

bugs:  Unfortunately, for reasons that are still a mystery
to me, the -s argument does not handle newline escape
characters (specifically, "\\n") properly.  Such escape
characters are "sanitized" by the Getopt::Std module.  I
have discovered that the Getopt::Long module seems to
exhibit the same behavior.  Maybe someday I'll bother to
fix this.

credits:        Chad L. Perrin (author)
                Perlmonks community (contributors)

license:  This utility released under CCD CopyWrite.  See
following URL for details.

        http://ccd.apotheon.org

EOT
}
[download]

That's it. Oh, yeah, and if anyone has any suggestions, comments, criticisms, complaints, or flames relating to the code itself, even if they don't answer my actual questions here, I'd love the feedback.

NOTE: The shell is not the (only) problem, here. Yes, it sanitizes unquoted escape characters, but it does not sanitize quoted escape characters. The script, on the other hand, does sanitize quoted escape characters, which leads me back to the original problem.

print substr("Just another Perl hacker", 0, -2);
- apotheon
CopyWrite Chad Perrin

In reply to Getopt, regexen, and newlines by apotheon

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.