fearless_fool has asked for the wisdom of the Perl Monks concerning the following question:

Are there standard perl functions for escaping and de-escaping a string? For example, if the five characters [\',"?] are considered "special" in some context and need escaping, and the single character [\] is the escape character, then escaping the string
yo, what's a "henway"?
would result in
yo\, what\'s a \"henway\"\?
and similarly, de-escaping the second string would result in the first.

Given perl's rich set of regex functions, I know this is simple to write, but I've also learned that nearly everything worth writing has been already written. Has it?

- ff
  • Comment on string escaping / de-escaping functions

Replies are listed 'Best First'.
Re: string escaping / de-escaping functions
by mr_mischief (Monsignor) on Oct 25, 2008 at 00:12 UTC
    Perhaps you have a specific application that would be better served by a tool meant for that application. Text::CSV and DBI for example have ways of working with data that do not require you to worry about escaping strings in your code.

    That being said, many useful things are already written more than once. Unfortunately they are often already written more than once in ways which do you little good for your specific problem. The most general and simplest answer to your query seems to be the narrowly named but broadly general Text::EscapeDelimiters which will escape more than delimiters. There is also the corresponding Text::DelimMatch for retrieving delimited text.

    String::Escape does what you are asking and more but only in certain narrow cases. Unfortunately, those cases do not include your example specification. You can add escaping methods to it, but then you'd have to write them.

    Encode::Escape wraps up several different types of escaping into one module.

    URI::Escape works well with URLs and URIs.

    Regexp::Common has many tasty treats in its namespace, including Regexp::Common::delimited.

Re: string escaping / de-escaping functions
by GrandFather (Saint) on Oct 25, 2008 at 00:02 UTC

    There are domain specific solutions (\Q and \E for regex escaping, Encode for character encoding, HTML::Entities and so on), but a regex is the general answer to the general problem.


    Perl reduces RSI - it saves typing
Re: string escaping / de-escaping functions
by ikegami (Patriarch) on Oct 25, 2008 at 00:16 UTC
    sub str2lit { my ($s) = @_; $s =~ s/([\\',"?])/\\$1/g; return $s; } sub lit2str{ my ($s) = @_; #$s =~ s/\\([^0-9a-zA-Z])/$1/g; # Probably better $s =~ s/\\([\\',"?])/$1/g; return $s; }
Re: string escaping / de-escaping functions
by juster (Friar) on Oct 25, 2008 at 06:20 UTC

    There is also the quotemeta builtin function, which quotes any non-word character with a \. quotemeta will quote the space character but that doesn't seem to affect printing. An esoteric builtin I learned and only ever saw mentioned... here on perlmonks!

    Example: perl -e "print quotemeta 'yo, what\'s a "henway"?'" Gives: yo\,\ what\'s\ a\ henway\?
      uotemeta will quote the space character but that doesn't seem to affect printing
      What do you mean by that?
        Sorry, you're right. What I meant to say is it doesn't seem to affect matching regular expressions. I'm not sure why I said printing. Then again I should assume quotemeta is safe for regular expressions if it is used by \Q and \E like the docs say! It's late and time for another Pabst!
Re: string escaping / de-escaping functions
by repellent (Priest) on Oct 25, 2008 at 04:44 UTC
Re: string escaping / de-escaping functions
by ww (Archbishop) on Oct 25, 2008 at 01:19 UTC

    I trust yours is a hypothetical question.

    Each of the members of your set of characters is special at times, but, AFAIK, they are not all special in the same context. Can you clarify the case (context) in which (you think) this might be true?

    One might adduce from the first sentence in your question that you believe they're special in a string: That's no so in any circumstance that occurs to me (even though ' and " are used when assigning strings to variables, and thus require escaping if either occurs inside a string quoted with itself.