Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Improve pipe open? (redirect hook)

by oiskuu (Hermit)
on Apr 01, 2017 at 21:07 UTC ( [id://1186696]=note: print w/replies, xml ) Need Help??


in reply to Improve pipe open?

Perl could certainly use a hook triggering before exec(). Performing custom setup between fork() and exec() is essential for correct operation in many scenarios. For example: thread save calling an external command, Re^6: Capture::Tiny alternative.

So the IPC modules might use constructs like

{ local $SIG{__EXEC__} = \&_do_redirect; system ... }
But this would no doubt have other creative uses. Syntactic sugar to make things neat.
{ use redirect qw( 3>&1 1>&2 2>&3 3>&- ); open my $fh, ...; `another cmd`; }

ps. I'm not entirely sure if the hook ought to be post-fork or pre-exec.

Replies are listed 'Best First'.
Re^2: Improve pipe open? (redirect hook)
by afoken (Chancellor) on Apr 02, 2017 at 11:42 UTC

    I don't see a need for a hook between fork and exec hidden in system and open, and abusing %SIG for that hook makes it even worse. That hook introduces action-at-a-distance.

    If you intend to make some stuff happen between fork and exec, write it explicitly:

    sub redirected_system { my @args=@_; my $pid=fork() // die "Can't fork: $!"; if ($pid) { # (parent) waitpid($pid,0); # plus whatever is needed to collect data from the child } else { # (child) # modify file handles as needed (redirection) # change UID, GID if needed # chdir if needed # chroot if needed exec { $args[0] } @args or die "exec failed: $!"; } }

    This is clean, readable, and has no action-at-a-distance. Of course, it is possible to use a generic function to allow changes to the child process without resorting to global variables:

    sub hooked_system(&@) { my ($hook,@args)=@; my $pid=fork() // die "Can't fork: $!"; if ($pid) { # (parent) waitpid($pid,0); # plus whatever is needed to collect data from the child } else { # (child) $hook->(); exec { $args[0] } @args or die "exec failed: $!"; } }

    I also don't see how a hook would solve the problem of perl invoking the default shell in case of three-argument pipe open or single-argument system. And please don't tell me that the hook code should start guessing how the single string should be splitted into a list of arguments. This is excactly what perl already does: It guesses, and if the string looks too complex to guess (see below), it delegates that to a random default shell. This is the cause of the trouble, not its solution.

    qx, ``, system and pipe open are already wrappers for fork and exec. Adding more and more parameters will give us nightmares like CreateProcessAsUser(), effectively passing more than 30 parameters plus command line arguments to a nightmare function that will finally start a child process. See also Re^3: Set env variables as part of a command.


    Perl guessing

    exec states:

    If there is only one element in LIST, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to execvp, which is more efficient.

    So, what exactly are shell metacharacters? I don't know. My guess is that a lot of the non-alpanumeric, printable ASCII characters count as shell metacharacters. They may depend on the operating system, too. And they may have changed over time. It seems that Perl_do_exec3() in doio.c of the perl 5.24.1 sources contains at least a little bit of the guessing logic. And it seems that the word "exec" at the start of the string also forces perl to invoke the default shell, not only non-alphanumeric characters. To make things even worse, some of the logic depends on the preprocessor symbol CSH. My guess is that happens only if the default shell is a csh variant.

    BTW: A trailing 2>&1 seems to be handled by perl as a special case, without resorting to the default shell.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Action-at-a-distance was precisely the intention in this case. One might then trivially enhance a standard capture with a certain additional effect like dropping of privileges. Callbacks like that allow for a (more) generalized routine instead of a bunch of specialized modules.

      Anyway, I was contemplating the numerous problems with piping/capturing I've witnessed on PM and elsewhere. Can you give an example where the list form open has caused mayhem, because of the one-element list?

      As far as qx{}; is concerned, I do not really see any problem. The string inside qx is not perl code, it is shell syntax. One could perhaps make a point about always requesting a shell, even when perl thinks this is redundant, like qx{}F; maybe. The opposite, to force an op to not do what it's intended to do, makes no sense.

      Edit. BTW: out of curiosity, do you sometimes use the <> operator in your code or do you always go for the safe diamond? I'd have simply plugged *that* hole, methinks...

      Edit2. Clarification in regards to "shell syntax". The qx is for interfacing with system shell, the meaning is thus "system shell syntax, whatever that may be". Never mind about the qx though, I just found your safe_qx() oddly named, that's all.

        Anyway, I was contemplating the numerous problems with piping/capturing I've witnessed on PM and elsewhere. Can you give an example where the list form open has caused mayhem, because of the one-element list?

        You can run into trouble everywhere perl runs into something like exec @list or system @list, where @list may contain only one element. You currently have to write exec { $list[0] } @list or system { $list[0] } @list. If you don't, and assume that system or exec will fail if the program in $list[0] does not exist, you may have a security problem. @list=('rm -rf /') is no problem with system { $list[0] } @list (it will fail with a "file not found" error), but will cause a lot of trouble with system @list, because perl will invoke rm. It's a trap, but it is documented and should be known.

        With open my $handle,'-|',@list, you will always run into that trap, because the indirect object ({ $list[0] }) that disables all code leading to the default shell can't be used with open. That's why I propose to add a flag to open so that there is a different way to disable code leading to the default shell.


        As far as qx{}; is concerned,

        qx/`` is not the point. qx is generally unportable and depends on the OS version due to the default shell behavior, with a few exceptions where all default shells behave the same or perl does not invoke the default shell.

        Unsafe pipe open (with up to three arguments for open) has the same problem. Safe pipe opens from perlipc with exec { $list[0] } @list in the child process completely avoids the default shell. Pipe open with at least four arguments for open (as implemented since perl 5.8.0) also avoids the default shell.


        I do not really see any problem. The string inside qx is not perl code, it is shell syntax. One could perhaps make a point about always requesting a shell, even when perl thinks this is redundant, like qx{}F; maybe. The opposite, to force an op to not do what it's intended to do, makes no sense.

        The string inside qx should be what you call "shell syntax", yes. So please define "shell syntax". Start with quoting rules that work for all shells. Have a look at https://www.in-ulm.de/~mascheck/various/bourne_args/ and https://www.in-ulm.de/~mascheck/various/ifs/ to get a feeling for the fun you will have.

        Let me list some default shells:

        • bourne shell (sh)
          • System III
          • SVR2
          • SVR3
          • SVR4
          • BSD
        • C shell (csh)
          • csh
          • tcsh
        • Korn shell (ksh)
          • pdksh
          • ksh93
          • mksh on Android
        • Bourne Again Shell (bash)
          • v1
          • v2
          • v3
          • v4
        • Almquist Shell (ash)
          • Original
          • Debian's fork of Almquist Shell (dash)
          • NetBSD fork
          • Busybox's fork
        • zsh
        • Plan9 shell (rc)
        • Windows' command.com
        • Windows' cmd.exe
        • command.com on OS/2

        All of these shells come in different versions, and they all have different behaviour when parsing strings into commands and arguments. See https://www.in-ulm.de/~mascheck/various/ for just a few of the many problems with shell behaviour on unixoid systems. And gues what happens when you feed a string intended for some unix shell to command.com or vice versa.

        That's the first problem with "the" shell. There is no single shell on every operating system that behaves the same on every operating system. The default shell is not even consistent across different versions of the same OS. See https://www.in-ulm.de/~mascheck/various/shells/ for a quite long list of default shells.

        The second problem is that perl guesses what may happen when "the" shell parses the string and sometimes tries to avoid "the" shell. Have a look at Perl_do_exec3() (see Re^2: Improve pipe open? (redirect hook)) to see details. And no, it's not only the misterious "shell metacharacters" that trigger using the shell.


        Action-at-a-distance was precisely the intention in this case. One might then trivially enhance a standard capture with a certain additional effect like dropping of privileges. Callbacks like that allow for a (more) generalized routine instead of a bunch of specialized modules.

        Action-at-a-distance is an excellent way to create unmaintainable, write-only code.

        Imagine a small, but not tiny, old project that makes use of qx. Code is spread over several modules, and it runs fine on current perl. I would say this is a quite common scenario. Now imagine that project needs a new feature. A coworker wraps it in a new module like this and commits to CVS, SVN, git or whatever.

        package New::Module; use strict; use warings; # 250 lines later: sub prepare_foo { # ... $SIG{'__EXEC__'}='special_foo'; # ... } # 100 lines later: sub do_foo { # ... my @text=`foo \$BAR` # ... } # 200 lines later: sub finish_foo { # ... $SIG{'__EXEC__'}=''; # ... } # 80 lines later: sub special_foo { # ... open STDOUT,">&STDERR"; # ... } # and 500 more lines 1;

        Now, customers report tons of bugs. Whle debugging, you find out that every single qx/`` in every single module suddenly behaves mad. Guess why.


        Edit. BTW: out of curiosity, do you sometimes use the <> operator in your code or do you always go for the safe diamond? I'd have simply plugged *that* hole, methinks...

        I don't use either, since years. I found a single old and unused script in a dark corner of my all-knowing, all-seeing SVN repository that uses <>.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1186696]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (2)
As of 2024-04-24 18:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found