quester has asked for the wisdom of the Perl Monks concerning the following question:

I have a script which uses the Perl Expect package to interact with various systems and generates a log file. Under certain conditions the systems will print passwords into the log as the script runs. I would like to avoid having the passwords displayed to the user or printed in the log. I also don't want the passwords to show up in other places, such as a ps -elf.

I don't want to modify the code inside the Perl Expect package, and changing the systems the script talks to is out of the question. If possible, I would like the output to be unbuffered, since the systems it talks to tend to get wedged sometimes and the display is very useful to track down problems, especially if you can see the last line.

I have tried various solutions, but I think there is something simple that I am missing about filtering the output of existing scripts.

My current solution looks like this. It doesn't address the last requirement; the output is line buffered, despite setting autoflush.

use warnings; use strict; my $password = "hide_me"; my $log_file = "test.log"; open STDOUT, "| perl | tee -a $log_file"; select STDOUT; $| = 1; print <<_END_FILTER_SCRIPT_; use warnings; use strict; select DATA; \$|=1; select STDOUT; \$|=1; while (<DATA>) { s/\Q$password\E/removed/ig; print; } __DATA__ _END_FILTER_SCRIPT_ print "The password is $password\n"; foreach (qw{Output is unbuffered if these words are printed one at a t +ime.}) { print $_, " "; sleep 1; } print "\n"; sleep 1; print "Output is line buffered if this appears one second after the la +st line.\n"; close STDOUT;
The output looks like this. The second line is delayed by twelve seconds because autoflush is not working, the words should come out one per second instead.
The password is removed Output is unbuffered if these words are printed one at a time. Output is line buffered if this appears one second after the last line +.
This is not supposed to be obfuscated, but I should probably point out that the script printed by the first print statment is going to run in the perl started by open. All of the output to be filtered is in the __DATA__ section of that inner script.


Can anyone point out a better way of doing this (without changing library code like Perl Expect)? "Better" can be any combination of unbuffered, easier to understand, easier to maintain, or already packaged in a library that I missed.

Replies are listed 'Best First'.
Re: Filtering passwords from the output of a script
by graff (Chancellor) on Nov 29, 2006 at 05:17 UTC
    This is not supposed to be obfuscated...

    I understand, and the OP code (once I grokked it) actually struck me as more clever than obfuscated. Nice, even.

    But if you really want completely unbuffered output to the log file and anywhere else, you have to use syswrite and sysread, and you have to use those exclusively. The standard i/o methods (i.e. print and <FILEHANDLE>) are intrinsically line-oriented --

    Well, actually, print and the diamond operator are record oriented, where the record delimiter is defined by the globals $/ and $\. So you could try playing with those -- but that won't be any less obfuscative than just using sysread and syswrite.

    I had to play with it a bit, but the following version of the OP code does what I think you want in terms of making sure that output passes through the pipe and into the log file with the shortest possible buffering delay.

    (Note that you cannot avoid doing word-level buffering, because without that, you wouldn't be able to use s/// to remove the password string. Also, a lot of stuff in the "internal" script needs to be escaped in order to work right -- I was actually puzzled that "\s" had to be "\\s" in the "if" clause, but doing "\\Q" and "\\E" was apparently unnecessary.)

    use warnings; use strict; my $password = "hide_me"; my $log_file = "test.log"; open STDOUT, "| perl | tee -a $log_file"; my $script = <<_END_FILTER_SCRIPT_; use warnings; use strict; my \$chr; \$_ = ''; while (sysread DATA, \$chr, 1) { \$_ .= \$chr; if ( /\\s\$/ ) { s/\Q$password\E/removed/ig; syswrite STDOUT, \$_; \$_ = ''; } } close STDOUT; __DATA__ _END_FILTER_SCRIPT_ syswrite STDOUT, $script; sleep 1; ## had to add this delay so next syswrite would always work syswrite STDOUT, "The password is $password\n"; foreach (qw{Output is unbuffered if these words are printed one at a t +ime.}) { syswrite STDOUT, $_." "; sleep 1; } syswrite STDOUT, "\n"; sleep 1; syswrite STDOUT, "Output is line buffered if the last line came out al +l at once.\n"; close STDOUT;
    I agree that having to stuff all those backslashes into the internal script is really ugly, so maybe you'll want to externalize that part of the code into a separate script file, as suggested in the first reply. (But that means storing your password string in yet another file, so as not to expose it in the system's process table.)

    (update: removed unnecessary "binmode" calls)

      graff and ikegami,

      Thanks++ to both of you. After getting a lot of inspiration from all your posts I tracked the problem down to the diamond operator, "<DATA>". A working, although possibly nonportable, version is at the bottom of this post.

      In my current environment (Linux 2.6, Perl 5.8) it isn't necessary to use syswrite; ordinary print statements work fine after setting autoflush ($| = 1.) This is very fortunate because I would like to stay away from modifying the guts of Perl Expect.

      The sleep statement that graff put in actually works like this: Perl is already trying to do buffered input on DATA in the inner script, and it tries to grab some of the input from the pipe. It hangs on to it when sysread is called. (That baffled me for a while, but putting a "print <DATA>" at the end of the script printed the missing first lines of input.) When I changed the sysread to read it worked reliably... on my current platform. A simple getc also works... again, on my current platform.

      If anyone can explain the real difference between Perl's sysread and read I'm all ears. What I found refers me to the documentation on the Unix functions read and fread, which look almost the same. Is using read or getc in this script likely to break if it's ported to a different operating system?

      If I do need to use sysread to avoid portability issues, I'm very much loath to put a sleep into production code. I tried select undef, undef, undef, 0.0001; on my laptop and it was long enough sometimes and not others. Ten microseconds was never enough and one millisecond was always enough... tonight, with nothing else running. My experience has been that the length of the sleep needed will vary by enormous factors depending on the details of the system load. I suppose that it if it was really necessary a file could be used as an "I am ready now" flag between the two processes.

      Thanks again for the suggestions about too many backslashes (I changed from a here document to single quotes) and the trick of waiting for whitespace to check for a password that arrives in pieces - I had missed that one.

      Sorry if I'm getting long winded. The big question: is getc a good way of doing this or is it nonportable?

      The current version of the code (with getc) looks like this.

      use warnings; use strict; my $password = "hide_me"; my $log_file = "test.log"; open STDOUT, "| perl | tee -a $log_file"; select STDOUT; $| = 1; print ' use warnings; use strict; select STDOUT; $| = 1; my $chr; $_ = ""; while ($chr = getc DATA) { $_ .= $chr; next unless $chr =~ /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ '; print "The password is $password\n"; foreach (qw{Output is unbuffered if these words are printed one at a t +ime.}) { print $_, " "; select undef, undef, undef, 0.3; } print "\n"; foreach (split //, "Even one character at a time: $password should be +filtered.\n") { print; select undef, undef, undef, 0.1; } close STDOUT;
      The output looks like this. The second and third lines are both printed a word at a time.
      The password is removed Output is unbuffered if these words are printed one at a time. Even one character at a time: removed should be filtered.

        The read and getc Perl functions correspond to the fread and fgetc C-lib functions. The C-lib I/O functions are buffered. When you ask for X bytes, it might actually read X+Y bytes internally. Subsequent reads will read from the Y bytes first.

        This is very good in your case, because you keeps asking for one byte. Most of the time, you'll just be reading from the buffer, which is faster than doing a real read.

        How do the C-lib functions get their data? Through system calls. sysread is Perl's interface to the read system call. The system I/O functions are not buffered.

        A key difference between read and sysread is that read(FH, $buf, $bytes) will wait for $bytes bytes to be available, whereas sysread(FH, $buf, $bytes) will return as soon as bytes become available. $bytes is simply a maximum for sysread. I took advantage of this to read in more than one byte at a time.

        C-lib and system functions should not both be used on the same file handle.


        Your usage of getc is incorrect. It returns undef at the end of input, not false.

        The select STDOUT; is useless.

        print ' use warnings; use strict; $| = 1; $_ = ""; while (defined($chr = getc DATA)) { $_ .= $chr; next unless $chr =~ /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ ';

        (Update: Nevermind, next unless /\s/; is buggy.

        Probably faster:

        print ' use warnings; use strict; $| = 1; $_ = ""; for (;;) { sysread(DATA, $_, 4096, length()) or last; next unless /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ ';

        )

      Three issues:
      • It is word-buffered. It will only output when whitespace is received.
      • It won't always work if the password has a space in it.
      • I imagine that reading a byte at a time (when more than a byte is available) is much slower than reading a chunk of bytes.

      See Re^3: Filtering passwords from the output of a script.

      By the way, binmode might be required. I remember having problems when attempting to read characters (as opposed to bytes) using sysread, but the details escape me.

        Good points, although /\s/ actually also matches newline and carriage return. As a sheer stroke of fortune, in my current application a restriction against using spaces or non-ASCII characters in passwords is appropriate. Also, I don't have to worry too much about efficiency, my logs only arrive at a few hundred bytes per second typically. Thanks again!
Re: Filtering passwords from the output of a script
by ikegami (Patriarch) on Nov 29, 2006 at 04:50 UTC
    I don't understand why you run two copies of Perl. What about just
    ... | perl -pe 'BEGIN { $|=1; } s/\Qhide_me/removed/ig' | tee -a test. +log

    or

    ... | censor_passwd | tee -a test.log
    #!/usr/bin/perl # censor_passwd use warnings; use strict; my $password = "hide_me"; $| = 1; while (<STDIN>) { s/\Q$password/removed/ig; print; }

    Update: By the way, this tool can be used to help guess the password. For example, feeding "abcdef" to this script checks the following passwords all at once:

    • a
    • ab
    • abc
    • abcd
    • abcde
    • abcdef
    • b
    • bc
    • bcd
    • bcde
    • bcdef
    • c
    • cd
    • cde
    • cdef
    • d
    • de
    • def
    • e
    • ef
    • f
      Your first alternative would make the password string available to all logged-in users via the "ps" command.

      Your second alternative will still be stuck with line buffering -- even with $| set to non-zero everywhere on every file handle, the standard print function and diamond read operators will operate only in line-buffered mode ($| merely makes sure that consecutive lines are not buffered into chunks of 4KB or 8KB or whatever the pipeline buffer size happens to be).

        Your first alternative would make the password string available to all logged-in users via the "ps" command.

        Yeah, that's why I put the second alternative. I should have mentioned it.

        Your second alternative will still be stuck with line buffering

        Oh! right! Here's a version that doesn't wait for a whole line to be read before

        #!/usr/bin/perl # censor_passwd use warnings; use strict; my $passwd = 'hide_me'; my $replacement = 'removed'; # Used to check if the start of the password # is at the end of the read data. The returned # regexp will match 0 characters if not. # For efficiency, it expects the data to be # "reverse"d. my $partial_re = do { my $r = reduce { "(?:$a$b)?" } '', reverse map quotemeta, map /(.)/g, $passwd; qr/$r/ }; binmode(STDIN); binmode(STDOUT); $| = 1; my $buf = ''; for (;;) { my $rv = sysread(STDIN, $buf, 4096, length($buf)); defined $rv or die("Unable to read from STDIN: $!\n"); $rv or last; for (;;) { my $pos = index($buf, $passwd); last if $pos < 0; substr($buf, $pos, length($passwd), $replacement); print(substr($buf, 0, $pos+length($replacement), '')); } reverse($buf) =~ /^$partial_re/; print(substr($buf, 0, length($buf)-$+[0], '')); } print($buf);

        Points of interest:

        • It performs minimal input buffering. It outputs everything as soon as it receives it, without waiting for lines or words to be completed. It only buffers when the data that was read ends with the start of the password.

        • It is very efficient. It reads a block of data at a time, if more than a byte of data is available. It avoids using /...\z/ and even slower /^.*?.../ by using reverse. It avoids using captures. A major loop was replaced with a precompiled regexp.

        ( Caveat: What if "hide" and "_me" are received seperatly? Code updated to address this issue. )