in reply to quoting for system() and friends

I'd encourage you to look at the system documentation and pass a list rather than try to quote things.

That having been said, I've used this at times:

sub shell_escape { my ( $string ) = @_; $string =~ s/\\/\\\\/g; $string =~ s/\"/\\\"/g; $string =~ s/\$/\\\$/g; $string =~ s/\`/\\\`/g; return $string; }

I use this to quote a string that I'm going to pass to the shell in double quotes.

my $suspect = shift; my $quoted_suspect = shell_escape( $suspect ); system( qq{echo "$quoted_suspect"} );

As I recall, I got the list of characters to quote from the bash man page somewhere, but I don't recall where. I'm comfortable using it to pass my arguments as I like, but I'm not sure I'd trust it to correctly escape a string created by a malicious attacker. In that case, use at your own risk.

Replies are listed 'Best First'.
Re^2: quoting for system() and friends
by ikegami (Patriarch) on Mar 06, 2007 at 03:38 UTC
    You might not be able to inject anything using it, but a NUL can confuse things:
    $ perl -e '$qs=chr(0); system( qq{echo "$qs"} );' Syntax error: Unterminated quoted string

    I wonder if there's a UNICODE character other than " and \ whose UTF-8 encoding contains the byte 34 (") or 92 (\)... ( No, doesn't appear to be. I bet that if I looked into how UNICODE points are transformed into UTF-8, I'd find the presence of those bytes impossible. The 8th bit seems to always be set. )

      I bet that if I looked into how UNICODE points are transformed into UTF-8, I'd find the presence of those bytes impossible.

      Definitely a safe bet. In UTF8, it's either ASCII, or it's part of a wide (multi-byte) character. All bytes of every UTF8 wide character always have their 8th bit set (i.e. are not ASCII). There's a section of the perlunicode manual that explains UTF8 encoding, and here is the very helpful chart that summarizes (slightly modified):

      Scalar Value UTF-8 (UTF-16 range) 1st byte 2nd byte 3rd byte 4th by +te 0000.0000,0xxx.xxxx 0xxx.xxxx (U0000 - U007F) (00 - 7F) 0000.0yyy,yyxx.xxxx 110y.yyyy 10xx.xxxx (U0080 - U07FF) (C2 - DF) (80 - BF) zzzz.yyyy,yyxx.xxxx 1110.zzzz 10yy.yyyy 10xx.xxxx (U0800 - UFFFF) (E0 - EF) (A0 - BF) (80 - BF) u.uuuu,zzzz.yyyy,yyxx.xxxx 1111.0uuu 10uu.zzzz 10yy.yyyy 10xx.x +xxx (U010000 - U10FFFF) (F0 - F7) (90 - BF) (80 - BF) (80 - +BF) 1101.10ww,wwzz.zzyy * 1101.11yy,yyxx.xxxx * uuuuu = wwww + 1 (i.e. uuuuu - 1 = wwww, given 10000(b) >= uuuuu >= + 1)
      (The rows that match /^[10xyz., ]+$/ are showing the bit patterns that demonstrate how the 16-bit character code point value is distributed over one or more bytes in UTF8; the rows containing hex numbers show the ranges implied by the bit patterns.)

      Note that unicode defines characters with code points beyond the 16-bit range, and these are cleanly stored as 4-byte characters in utf8; they're a bit messy in utf-16 (involving 16-bit code points in the evil "surrogate range").

Re^2: quoting for system() and friends
by ikegami (Patriarch) on Mar 06, 2007 at 03:52 UTC

    As I recall, I got the list of characters to quote from the bash man page somewhere

    On systems with sh, sh (not bash) is used (even if the user's default shell is bash). You need to consult the sh man page. On FreeBSD, the man page for sh says:

    Enclosing characters within double quotes preserves the literal meaning of all characters except dollarsign (`$'), backquote (``'), and backslash (`\'). The backslash inside double quotes is historically weird. It remains literal unless it precedes the following characters, which it serves to quote: $  `  "  \  \n