|Come for the quick hacks, stay for the epiphanies.|
Problems with String::ShellQuoteby afoken (Chancellor)
|on Aug 18, 2022 at 17:53 UTC||Need Help??|
I have bashed String::ShellQuote several times:
Most times, it was because the module promises to solve a problem that simply disappears completely if you avoid the shell. See The problem of "the" default shell.
The problem boils down to generate a shell script from perl that will be run by different a user, perhaps on a different computer:
My Perl utility generates a bash script that consists of mkdir/rsync/cp commands. This bash script is later used by users (this means that I don't want to actually run those commands when my utility runs, rather just to generate the script).
You seem to allege some problem with shell_quote, but none of the linked post identify one. The all seemed centered around the idea of avoiding the shell is better. While true, that's not a problem with shell_quote.
So, let's have a look at the source code of String::ShellQuote version 1.04, dated 2010-06-11.
The module clearly states in "Bugs" that ...
Only Bourne shell quoting is supported.
Bourne is a large family of shells, but not every shell is a bourne shell. Also, not every default shell is a bourne shell. See https://www.in-ulm.de/~mascheck/various/shells/. Quite obviously, neither command.com from DOS and Windows nor cmd.exe from Windows are even vaguely similar to a bourne shell. The Almquist shell variants are very similar to bourne, but not exactly: https://www.in-ulm.de/~mascheck/various/ash/ The korn shells obviously aren't bourne shells, either.
So, as stated by the author, you should not expect the result values of the various functions to be compatible with anything but bourne shells.
With that out of the way, let's assume some bourne shell.
A 7th edition Bourne shell surely is a bourne shell, right?
There is a script that tries to find the version of your bourne compatible shell: https://www.in-ulm.de/~mascheck/various/whatshell/whatshell.sh.html. Did you notice something? There is also a commented version of that script at https://www.in-ulm.de/~mascheck/various/whatshell/whatshell.sh.comments.html. The very first explaining comment is this:
So, shell_comment_quote() should better use the null command followed by a single-quoted string so that the output works with the bourne shell.
This is the documentation:
shell_comment_quote quotes the string so that it can safely be included in a shell-style comment (the current algorithm is that a sharp character is placed after any newlines in the string).
And this is the code:
It does what is documented, but not every bourne shell will accept the output as comment. Oops #1.
There are two similar functions wrapping the quoting backend function:
The backend function returns a reference to an error array and the quoted string. shell_quote() removes repeated error messages, and finally croak()s. The only reason for this overhead I can think of is to get a list of all errors at once instead of getting just the first error. shell_quote_best_effort() just ignores all errors and returns whatever survived the backend function. If errors occured, that may be plain wrong. At least, this behaviour is documented:
This is like shell_quote, excpet [sic!] if the string can't be safely quoted it does the best it can and returns the result, instead of dying.
Now, what errors may be returned by the backend function?
Yes, that's all. There is only one possible error. It does not like ASCII NUL, because ASCII NUL can not be passed as argument to programs. And because it does not like them, they are simply removed.
Whenever shell_quote() throws an error, at least one of its arguments contained at least one NUL character. shell_quote_best_effort(), in the same situation, just silently damages your data. Oops #2.
In all other cases, shell_quote_best_effort() behaves exactly like shell_quote().
Now, let's look at the quoting:
shell_quote(), which calls _shell_quote_backend(), is documented as following:
shell_quote quotes strings so they can be passed through the shell. Each string is quoted so that the shell will pass it along as a single argument and without further interpretation. If no strings are given an empty string is returned.
This is the code:
Right at the start of the foreach loop, an undefined parameter is treated like an empty string and simply returns ''. Note that next jumps to the continue block at the end of the foreach loop. Personally, I would not accept an undefined value, because probably something went wrong in the caller if we get undefined parameters.
Following that, NUL characters are killed, and data is damaged at the same time. See above. Personally, I would throw an error right here, NUL characters are a sure sign that something went wrong in the caller, and it makes no sense to continue.
The next step following the first comment in the forech loop is explained in that comment. A feature rarely known to beginners is that you can set environment variables for just the invoked program by prefixing the program with a list of key-value pairs. FOO=1 BAR=2 baz answer=42 invokes baz with the environment variables FOO and BAR set to 1 and 2, and a single argument answer=42. If you want to invoke a program named FOO=1 instead, and pass it the arguments BAR=2, baz, and answer=42, you need to quote at least the first equal sign.
The flag variables in the first if-then-else: $escape is reset for each parameter, $saw_non_equal is set as soon as a parameter does not contain an equal sign, and stays set. If an equal sign is found, and all previous parameters (if any) also contained equal signs, $escape is set, which forces quoting. This is not strictly needed: If the first parameter contains an equal sign and is quoted, it is taken as program name, and everything following will be read as arguments. So it would be sufficient to check the first parameter for an equal sign. On the other hand, it also does not hurt to quote every string that contains an equal sign, and it would make the code much simpler.
The whitelist matching if: If the parameter contains a character that is (quoting the output of YAPE::Regex::Explain) any character except: word characters (a-z, A-Z, 0-9, _), '!', '%', '+', ',', '\-', '.', '/', ':', '=', '@', '^', the $escape flag is set. The intention seems to be to avoid quoting if not strictly needed. I'm not sure if all of those characters in the whilelist are harmless. At least in bash (which is a bourne shell), at least the '!' does have a special meaning in the first position:
Note: the last example fails to find the "foo" command, not the "!" command. So "!" should better not be in that whitelist. Oops #3.
The last if in the foreach loop: You want to escape if the $escape flag is set. Sure. But you also want to escape if the $saw_non_equal flag is not set, i.e. all previous parameters, if any, contained an equal sign, and at the same time, the current parameter also contains an equal sign. Do you remember this condition? A few lines above, the $escape flag was already set, depending on exactly this condition. This second condition is completely redundant. Belts and braces, or lost in code?
The escaping: Singe quotes are replaced with the sequence '\'', which will end a single-quoted string, then add a single quote (quoted by the backslash), and finally begins a new single-quoted string. Ignore the next, long subsitution for now. $_ = "'$_'"; puts the partly-escaped string in a pair of single quotes. The next two substitutions s/^''//; and s/''$//; remove a leading resp. trailing empty single-quoted string. This may happen if the original parameter begins resp. ends with a single quote.
The long substitution replaces a sequence of at least two escaped single quotes ('\'') by '", followed by a bare single quote for each orignal single quote, followed by "'. This works almost like '\'', ending a single quoted string, then adding a double quoted string of single quotes, and finally starting a new single quoted string. For an original parameter of two single quotes, this finally results in "''" instead of \'\', with every further single quote, the double quoted string will be shorter that the bashslashed string ("'''" instead of \'\'\').
Joining the quoted strings: The foreach loop replaces the elements of @in with the quoted string in $_ ($_ is aliased to each element of @in). The continue block appends each quoted string and a space to $ret. Finally, chop $ret removes the last space. Is a simple join(' ',@in) too obvious?
Combining Oops #3 and the suppressed quoting of equal signs:
In ! FOO=BAR baz, the bash treats baz as the executable, as indicated by the error message, and FOO=BAR as extra environment for the executable.
In shell_quote("!","FOO=BAR","baz"), ! should be the executable, simply because it is the first argument. Oops #3 prevents that it is quoted. Because the first parameter to shell_quote() does not contain an equal sign, escaping of equal signs is disabled for the remaining parameters. Oops #4.
String::ShellQuote does not even work properly for bourne shells.
Oops #1: Assumes every bourne shell accepts # for comments. Most of them do, but the ancient V7 bourne shell does not. Oh well. Document it as limitation and ship it.
Oops #2: Silent damaging of data. IMHO not acceptable.
Oops #3: Not quoting a character that will be interpreted by at least one a bourne shell (bash) if not quoted. IMHO not acceptable.
Oops #4: Oops #3 may cause more missing quoting due to overly complex escaping descision. IMHO not acceptable.
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)