Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Problems with String::ShellQuote

by afoken (Chancellor)
on Aug 18, 2022 at 17:53 UTC ( #11146233=perlmeditation: print w/replies, xml ) Need Help??

I have bashed String::ShellQuote several times:

Most times, it was because the module promises to solve a problem that simply disappears completely if you avoid the shell. See The problem of "the" default shell.

Now, ovedpo15 came up with a problem that looks like a good reason to have a module like String::ShellQuote, and choroba proposed String::ShellQuote.

The problem boils down to generate a shell script from perl that will be run by different a user, perhaps on a different computer:

My Perl utility generates a bash script that consists of mkdir/rsync/cp commands. This bash script is later used by users (this means that I don't want to actually run those commands when my utility runs, rather just to generate the script).

And, in an answer to a previous bashing, ikegami stated:

You seem to allege some problem with shell_quote, but none of the linked post identify one. The all seemed centered around the idea of avoiding the shell is better. While true, that's not a problem with shell_quote.

So, let's have a look at the source code of String::ShellQuote version 1.04, dated 2010-06-11.


The module clearly states in "Bugs" that ...

Only Bourne shell quoting is supported.

Bourne is a large family of shells, but not every shell is a bourne shell. Also, not every default shell is a bourne shell. See https://www.in-ulm.de/~mascheck/various/shells/. Quite obviously, neither command.com from DOS and Windows nor cmd.exe from Windows are even vaguely similar to a bourne shell. The Almquist shell variants are very similar to bourne, but not exactly: https://www.in-ulm.de/~mascheck/various/ash/ The korn shells obviously aren't bourne shells, either.

So, as stated by the author, you should not expect the result values of the various functions to be compatible with anything but bourne shells.


With that out of the way, let's assume some bourne shell.

A 7th edition Bourne shell surely is a bourne shell, right?

There is a script that tries to find the version of your bourne compatible shell: https://www.in-ulm.de/~mascheck/various/whatshell/whatshell.sh.html. Did you notice something? There is also a commented version of that script at https://www.in-ulm.de/~mascheck/various/whatshell/whatshell.sh.comments.html. The very first explaining comment is this:

: '7th edition Bourne shell aka the V7 shell did not know # as com +ment sign, yet.' : 'Workaround: the argument to the : null command can be considere +d a comment,' : 'protect it, because the shell would have to parse it otherwise. +'

So, shell_comment_quote() should better use the null command followed by a single-quoted string so that the output works with the bourne shell.

This is the documentation:

shell_comment_quote quotes the string so that it can safely be included in a shell-style comment (the current algorithm is that a sharp character is placed after any newlines in the string).

And this is the code:

sub shell_comment_quote { return '' unless @_; unless (@_ == 1) { croak "Too many arguments to shell_comment_quote " . "(got " . @_ . " expected 1)"; } local $_ = shift; s/\n/\n#/g; return $_; }

It does what is documented, but not every bourne shell will accept the output as comment. Oops #1.


There are two similar functions wrapping the quoting backend function:

sub shell_quote { my ($rerr, $s) = _shell_quote_backend @_; if (@$rerr) { my %seen; @$rerr = grep { !$seen{$_}++ } @$rerr; my $s = join '', map { "shell_quote(): $_\n" } @$rerr; chomp $s; croak $s; } return $s; }

and

sub shell_quote_best_effort { my ($rerr, $s) = _shell_quote_backend @_; return $s; }

The backend function returns a reference to an error array and the quoted string. shell_quote() removes repeated error messages, and finally croak()s. The only reason for this overhead I can think of is to get a list of all errors at once instead of getting just the first error. shell_quote_best_effort() just ignores all errors and returns whatever survived the backend function. If errors occured, that may be plain wrong. At least, this behaviour is documented:

This is like shell_quote, excpet [sic!] if the string can't be safely quoted it does the best it can and returns the result, instead of dying.

Now, what errors may be returned by the backend function?

sub _shell_quote_backend { my @in = @_; my @err = (); # ... return \@err, '' unless @in; # ... if (s/\x00//g) { push @err, "No way to quote string containing null (\\000) + bytes"; } # ... return \@err, $ret; }

Yes, that's all. There is only one possible error. It does not like ASCII NUL, because ASCII NUL can not be passed as argument to programs. And because it does not like them, they are simply removed.

Whenever shell_quote() throws an error, at least one of its arguments contained at least one NUL character. shell_quote_best_effort(), in the same situation, just silently damages your data. Oops #2.

In all other cases, shell_quote_best_effort() behaves exactly like shell_quote().


Now, let's look at the quoting:

shell_quote(), which calls _shell_quote_backend(), is documented as following:

shell_quote quotes strings so they can be passed through the shell. Each string is quoted so that the shell will pass it along as a single argument and without further interpretation. If no strings are given an empty string is returned.

This is the code:

sub _shell_quote_backend { my @in = @_; my @err = (); if (0) { require RS::Handy; print RS::Handy::data_dump(\@in); } return \@err, '' unless @in; my $ret = ''; my $saw_non_equal = 0; foreach (@in) { if (!defined $_ or $_ eq '') { $_ = "''"; next; } if (s/\x00//g) { push @err, "No way to quote string containing null (\\000) + bytes"; } my $escape = 0; # = needs quoting when it's the first element (or part of a # series of such elements), as in command position it's a # program-local environment setting if (/=/) { if (!$saw_non_equal) { $escape = 1; } } else { $saw_non_equal = 1; } if (m|[^\w!%+,\-./:=@^]|) { $escape = 1; } if ($escape || (!$saw_non_equal && /=/)) { # ' -> '\'' s/'/'\\''/g; # make multiple ' in a row look simpler # '\'''\'''\'' -> '"'''"' s|((?:'\\''){2,})|q{'"} . (q{'} x (length($1) / 4)) . q{"' +}|ge; $_ = "'$_'"; s/^''//; s/''$//; } } continue { $ret .= "$_ "; } chop $ret; return \@err, $ret; }

Right at the start of the foreach loop, an undefined parameter is treated like an empty string and simply returns ''. Note that next jumps to the continue block at the end of the foreach loop. Personally, I would not accept an undefined value, because probably something went wrong in the caller if we get undefined parameters.

Following that, NUL characters are killed, and data is damaged at the same time. See above. Personally, I would throw an error right here, NUL characters are a sure sign that something went wrong in the caller, and it makes no sense to continue.

The next step following the first comment in the forech loop is explained in that comment. A feature rarely known to beginners is that you can set environment variables for just the invoked program by prefixing the program with a list of key-value pairs. FOO=1 BAR=2 baz answer=42 invokes baz with the environment variables FOO and BAR set to 1 and 2, and a single argument answer=42. If you want to invoke a program named FOO=1 instead, and pass it the arguments BAR=2, baz, and answer=42, you need to quote at least the first equal sign.

The flag variables in the first if-then-else: $escape is reset for each parameter, $saw_non_equal is set as soon as a parameter does not contain an equal sign, and stays set. If an equal sign is found, and all previous parameters (if any) also contained equal signs, $escape is set, which forces quoting. This is not strictly needed: If the first parameter contains an equal sign and is quoted, it is taken as program name, and everything following will be read as arguments. So it would be sufficient to check the first parameter for an equal sign. On the other hand, it also does not hurt to quote every string that contains an equal sign, and it would make the code much simpler.

The whitelist matching if: If the parameter contains a character that is (quoting the output of YAPE::Regex::Explain) any character except: word characters (a-z, A-Z, 0-9, _), '!', '%', '+', ',', '\-', '.', '/', ':', '=', '@', '^', the $escape flag is set. The intention seems to be to avoid quoting if not strictly needed. I'm not sure if all of those characters in the whilelist are harmless. At least in bash (which is a bourne shell), at least the '!' does have a special meaning in the first position:

>bash --version GNU bash, version 4.3.48(1)-release (x86_64-slackware-linux-gnu) Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gp +l.html> This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. >echo with some arguments ! with some arguments ! >foo with some arguments ! -bash: foo: command not found >'!' foo -bash: !: command not found >! foo -bash: foo: command not found >

Note: the last example fails to find the "foo" command, not the "!" command. So "!" should better not be in that whitelist. Oops #3.

The last if in the foreach loop: You want to escape if the $escape flag is set. Sure. But you also want to escape if the $saw_non_equal flag is not set, i.e. all previous parameters, if any, contained an equal sign, and at the same time, the current parameter also contains an equal sign. Do you remember this condition? A few lines above, the $escape flag was already set, depending on exactly this condition. This second condition is completely redundant. Belts and braces, or lost in code?


The escaping: Singe quotes are replaced with the sequence '\'', which will end a single-quoted string, then add a single quote (quoted by the backslash), and finally begins a new single-quoted string. Ignore the next, long subsitution for now. $_ = "'$_'"; puts the partly-escaped string in a pair of single quotes. The next two substitutions s/^''//; and s/''$//; remove a leading resp. trailing empty single-quoted string. This may happen if the original parameter begins resp. ends with a single quote.

The long substitution replaces a sequence of at least two escaped single quotes ('\'') by '", followed by a bare single quote for each orignal single quote, followed by "'. This works almost like '\'', ending a single quoted string, then adding a double quoted string of single quotes, and finally starting a new single quoted string. For an original parameter of two single quotes, this finally results in "''" instead of \'\', with every further single quote, the double quoted string will be shorter that the bashslashed string ("'''" instead of \'\'\').


Joining the quoted strings: The foreach loop replaces the elements of @in with the quoted string in $_ ($_ is aliased to each element of @in). The continue block appends each quoted string and a space to $ret. Finally, chop $ret removes the last space. Is a simple join(' ',@in) too obvious?


Combining Oops #3 and the suppressed quoting of equal signs:

>bash --version GNU bash, version 4.3.48(1)-release (x86_64-slackware-linux-gnu) Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gp +l.html> This is free software; you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. >perl -MString::ShellQuote=shell_quote -E 'say shell_quote("!","FOO=BA +R","baz")' ! FOO=BAR baz >! FOO=BAR baz -bash: baz: command not found >'!' FOO=BAR baz -bash: !: command not found >! 'FOO=BAR' baz -bash: FOO=BAR: command not found >

In ! FOO=BAR baz, the bash treats baz as the executable, as indicated by the error message, and FOO=BAR as extra environment for the executable.

In shell_quote("!","FOO=BAR","baz"), ! should be the executable, simply because it is the first argument. Oops #3 prevents that it is quoted. Because the first parameter to shell_quote() does not contain an equal sign, escaping of equal signs is disabled for the remaining parameters. Oops #4.


Summay:

String::ShellQuote does not even work properly for bourne shells.

Oops #1: Assumes every bourne shell accepts # for comments. Most of them do, but the ancient V7 bourne shell does not. Oh well. Document it as limitation and ship it.

Oops #2: Silent damaging of data. IMHO not acceptable.

Oops #3: Not quoting a character that will be interpreted by at least one a bourne shell (bash) if not quoted. IMHO not acceptable.

Oops #4: Oops #3 may cause more missing quoting due to overly complex escaping descision. IMHO not acceptable.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re: Problems with String::ShellQuote
by salva (Canon) on Aug 26, 2022 at 12:28 UTC
    A set of quoting modules for different shells (Bourne, csh, MS, etc.) is included as part of my module Net::OpenSSH, and it is a can of worms.

    As part of the test suite, random strings are generated, quoted and parser by the set of supported shells found by the module in every computer where the tests are run. Over the years I have received a good number of bug reports related to that, most of the times caused by bugs or by unexpected behaviors in the shells, specially for commercial vendors maintaining their code in isolation (for instance, IIRC, AIX ksh).

    So, String::ShellQuote may have its issues, and retrospectively some of them probably just caused by bad design decisions, but it should also be taken into consideration that the complexity of the problem it tries to solve is not negligible.

Re: Problems with String::ShellQuote
by afoken (Chancellor) on Aug 20, 2022 at 14:56 UTC

    Let's try to be a little bit positive. How could String::ShellQuote be fixed?

    (Or: how can I do something usefull while waiting for the four-hour offline RAID check on one of our servers?)


    tybalt89++ has posted a good starting point in Re: Escape special chars in a path:

    bash quoting is simple: put single quotes around the string after replacing every ' in the string with '"'"'

    sub bashquote { "'" . shift =~ s/'/'"'"'/gr . "'" }

    Sure, sometimes the wrapping single quotes are not needed, but if they are just around directory names and file names, they don't hurt.

    This has to be done for every single argument to be passed to the bash (or, in fact, any bourne shell).

    s///gr does not work in older perls, but that's easy to fix. Something like this:

    # untested sub bashquote { local $_ = shift; s/'/'"'"'/g; return "'$_'"; }

    String::ShellQuote has an important point: There is no way to pass NUL characters as argument, simply because NUL is used as string terminator. So you want to report that problem. Also, there is no way to pass Perl's undef, so also report that:

    # untested sub bashquote { local $_ = shift; defined or die "Argument must not be undef"; /\x00/ and die "String must not contain NUL characters"; s/'/'"'"'/g; return "'$_'"; }

    String::ShellQuote uses a slightly shorter way to pass a single quote, \' instead of "'". It should not matter much, but it keeps the returned string shorter. That means we might avoid a limit for the length of a command:

    # untested sub bashquote { local $_ = shift; defined or die "Argument must not be undef"; /\x00/ and die "String must not contain NUL characters"; s/'/'\''/g; return "'$_'"; }

    String::ShellQuote removes redundant leading and trailing empty single quoted strings from the returned value. Again, it should not matter much, but it shortens the returned string. But we must handle the empty string in a special case, or else it would be returned as unquoted empty string:

    # untested sub bashquote { local $_ = shift; defined or die "Argument must not be undef"; /\x00/ and die "String must not contain NUL characters"; length or return "''"; s/'/'\''/g; $_ = "'$_'"; s/^''//; s/''$//; return $_; }

    String::ShellQuote is quite clever about shortening multiple successive single quotes, by switching from \' to double quotes starting at two successive single quotes:

    # untested sub bashquote { local $_ = shift; defined or die "Argument must not be undef"; /\x00/ and die "String must not contain NUL characters"; length or return "''"; s/'/'\''/g; s|((?:'\\''){2,})|q{'"} . (q{'} x (length($1) / 4)) . q{"'}|ge; $_="'$_'"; s/^''//; s/''$//; return $_; }

    There is one possible extra step: Avoid quoting if the argument only conatins "safe" characters. ASCII letters and ASCII digits should be safe. Underscore should be save. <Update>Forward slash should be safe.</Update> A lonely "." as first part of a command in bash is short for "source", so that's not safe. "!" is not safe, as shown. And I won't bother researching all of the other interpunction characters for edge cases. So let's add that shortcut:

    # untested sub bashquote { local $_ = shift; defined or die "Argument must not be undef"; /\x00/ and die "String must not contain NUL characters"; length or return "''"; # before adding "/" to safe characters: /^[A-Za-z0-9_]+$/ and retu +rn $_; m|^[A-Za-z0-9/_]+$| and return $_; s/'/'\''/g; s|((?:'\\''){2,})|q{'"} . (q{'} x (length($1) / 4)) . q{"'}|ge; $_="'$_'"; s/^''//; s/''$//; return $_; }

    Update: Added / as safe character.

    This should give us a quoting function for a single string that returns a string that is safe to use anywhere in a bash command, and often as short as possible. If in doubt, the returned string may contain redundant quotes. This should work with any 8-bit-clean bourne shell.

    That's essentially the inner part of the foreach loop in String::ShellQuote. Now lets build the list return that returns a properly quoted string that a bourne shell treat as a list of strings. How hard can that be?

    # untested sub shell_quote { return join(' ',map { bashquote($_) } @_); }

    Update: added a missing ) and the # untested comment

    No position-dependent special cases for equal signs. No special cases for lonely periods or exlamation marks. They are always quoted, period.

    What about shell_quote_best_effort()?

    There is simply no "best effort" way. Either the result is properly quoted, or data will be damaged. shell_quote_best_effort() should be deprecated and finally be removed. Until then, it could be an alias for shell_quote().

    And finally shell_comment_quote(). It works, I would have written it slightly different, but there is no change needed in the code. The documentation should mention the problem of ancient bourne shells that do not implement comments.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Problems with String::ShellQuote
by afoken (Chancellor) on Aug 18, 2022 at 18:39 UTC

    Remember that statement in String::ShellQuote?

    shell_quote quotes strings so they can be passed through the shell. Each string is quoted so that the shell will pass it along as a single argument and without further interpretation.

    Reading https://www.in-ulm.de/~mascheck/bourne/ shows another problem when trying to quote for bourne shells in a way that data is passed unmodified:

    The 7th edition Unix shell (the one that does not know #) is not 8-bit clean, it uses the 8th bit internally for quoting, and sometimes forgets to clean that bit.

    Bash before 1.13 also used the 8th bit internally for quoting.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Bash before 1.13 also used the 8th bit internally for quoting.

      I couldn't find a release date for Bash 1.13 but 1.14 was released on the 2nd of June 1994, according to this release notice - so that's a few months before Perl 5.000. If anyone is still running a version of Bash prior to 1.13 they'll have substantially bigger problems than this.


      🦛

      In fact, wikipedia says Features introduced after 1979:
      • # as comment character System III shell (1981)
      • 8-bit clean SVR3 shell (1986)

      so it rather weakens your argument to lead with a complaint about this. I doubt anyone cares about shipping software compatible with anything before POSIX.

      And of course, the author of ShellQuote probably meant "bash" rather than "bourne", just guessing.

        I doubt anyone cares about shipping software compatible with anything before POSIX.

        As I wrote:

        Oops #1: Assumes every bourne shell accepts # for comments. Most of them do, but the ancient V7 bourne shell does not. Oh well. Document it as limitation and ship it.

        Same for non-8-bit-clean shells.

        Of course, Oops #1 and Oops #2 were the low-hanging fruits, and the 8-bit problem of old shells isin the same category.

        And of course, the author of ShellQuote probably meant "bash" rather than "bourne", just guessing.

        Even if he meant bash instead of bourne, String::ShellQuote still damages data (Oops #2) and still does not quote correctly (Oops #3 and #4).

        And if your guess is correct, it is at least a documentation error. Every homo sapiens is a mammal, but not every mammal is a homo sapiens. Every bash is a bourne shell, but not every bourne shell is a bash.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Problems with String::ShellQuote
by ikegami (Patriarch) on Jun 12, 2023 at 16:37 UTC

    There's absolutely no reason to avoid using shell_quote listed in the parent post.


    The "issues" identified:

    • No mention of lack of backwards compatibility with shells from 1979.

      A 7th edition Bourne shell surely is a bourne shell, right?

      It's perfectly legit to avoid mentioning it doesn't work with shells from 1979. For reference, Perl only came out in 1987, Perl modules were only introduced in 1994 (I think), and String::ShellQuote only came out in 1997. Mentioning lack of support for a 44 year old shell would only add noise and decrease the quality of the documentation.

      NOT AN ISSUE.

    • Mishandling undefined values

      Personally, I would not accept an undefined value, because probably something went wrong in the caller if we get undefined parameters.

      Fair. It doesn't even warn. This isn't a bug, but this module should be a bit more thorough given the security implications.

      WANTED IMPROVEMENT. There are backwards compatibility issues with making this improvement.

    • shell_quote_best_effort doesn't make much sense

      shell_quote_best_effort() just ignores all errors and returns whatever survived the backend function. If errors occured, that may be plain wrong. At least, this behaviour is documented

      Yeah, but this has nothing to do with shell_quote.

      NOT AN ISSUE. Don't use a sub that's documented to produce garbage.

    • ! shouldn't be whitelisted

      "!" should better not be in that whitelist.

      I agree, though the chance of it being harmful is effectively nil since ! is only significant in interactive shells.

      NEEDS TO BE FIXED, and it's trivial to fix, but it's never going to matter.


    I shall file tickets for the two issues shortly.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11146233]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2023-12-10 11:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?











    Results (39 votes). Check out past polls.

    Notices?