in reply to Re: Filtering passwords from the output of a script
in thread Filtering passwords from the output of a script

graff and ikegami,

Thanks++ to both of you. After getting a lot of inspiration from all your posts I tracked the problem down to the diamond operator, "<DATA>". A working, although possibly nonportable, version is at the bottom of this post.

In my current environment (Linux 2.6, Perl 5.8) it isn't necessary to use syswrite; ordinary print statements work fine after setting autoflush ($| = 1.) This is very fortunate because I would like to stay away from modifying the guts of Perl Expect.

The sleep statement that graff put in actually works like this: Perl is already trying to do buffered input on DATA in the inner script, and it tries to grab some of the input from the pipe. It hangs on to it when sysread is called. (That baffled me for a while, but putting a "print <DATA>" at the end of the script printed the missing first lines of input.) When I changed the sysread to read it worked reliably... on my current platform. A simple getc also works... again, on my current platform.

If anyone can explain the real difference between Perl's sysread and read I'm all ears. What I found refers me to the documentation on the Unix functions read and fread, which look almost the same. Is using read or getc in this script likely to break if it's ported to a different operating system?

If I do need to use sysread to avoid portability issues, I'm very much loath to put a sleep into production code. I tried select undef, undef, undef, 0.0001; on my laptop and it was long enough sometimes and not others. Ten microseconds was never enough and one millisecond was always enough... tonight, with nothing else running. My experience has been that the length of the sleep needed will vary by enormous factors depending on the details of the system load. I suppose that it if it was really necessary a file could be used as an "I am ready now" flag between the two processes.

Thanks again for the suggestions about too many backslashes (I changed from a here document to single quotes) and the trick of waiting for whitespace to check for a password that arrives in pieces - I had missed that one.

Sorry if I'm getting long winded. The big question: is getc a good way of doing this or is it nonportable?

The current version of the code (with getc) looks like this.

use warnings; use strict; my $password = "hide_me"; my $log_file = "test.log"; open STDOUT, "| perl | tee -a $log_file"; select STDOUT; $| = 1; print ' use warnings; use strict; select STDOUT; $| = 1; my $chr; $_ = ""; while ($chr = getc DATA) { $_ .= $chr; next unless $chr =~ /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ '; print "The password is $password\n"; foreach (qw{Output is unbuffered if these words are printed one at a t +ime.}) { print $_, " "; select undef, undef, undef, 0.3; } print "\n"; foreach (split //, "Even one character at a time: $password should be +filtered.\n") { print; select undef, undef, undef, 0.1; } close STDOUT;
The output looks like this. The second and third lines are both printed a word at a time.
The password is removed Output is unbuffered if these words are printed one at a time. Even one character at a time: removed should be filtered.

Replies are listed 'Best First'.
Re^3: Filtering passwords from the output of a script
by ikegami (Patriarch) on Nov 29, 2006 at 08:28 UTC

    The read and getc Perl functions correspond to the fread and fgetc C-lib functions. The C-lib I/O functions are buffered. When you ask for X bytes, it might actually read X+Y bytes internally. Subsequent reads will read from the Y bytes first.

    This is very good in your case, because you keeps asking for one byte. Most of the time, you'll just be reading from the buffer, which is faster than doing a real read.

    How do the C-lib functions get their data? Through system calls. sysread is Perl's interface to the read system call. The system I/O functions are not buffered.

    A key difference between read and sysread is that read(FH, $buf, $bytes) will wait for $bytes bytes to be available, whereas sysread(FH, $buf, $bytes) will return as soon as bytes become available. $bytes is simply a maximum for sysread. I took advantage of this to read in more than one byte at a time.

    C-lib and system functions should not both be used on the same file handle.


    Your usage of getc is incorrect. It returns undef at the end of input, not false.

    The select STDOUT; is useless.

    print ' use warnings; use strict; $| = 1; $_ = ""; while (defined($chr = getc DATA)) { $_ .= $chr; next unless $chr =~ /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ ';

    (Update: Nevermind, next unless /\s/; is buggy.

    Probably faster:

    print ' use warnings; use strict; $| = 1; $_ = ""; for (;;) { sysread(DATA, $_, 4096, length()) or last; next unless /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ ';

    )

      Thanks again. That's a good point about changing while ($chr...) to while (defined ($chr...)). Without it, a null character will cause the logging subprocess to exit, which would be very sad.

      Two minor notes about the sysread code: one would be to add some synchronizing code so the parent process can wait until first sysread is in progress. That way Perl won't eat any of the parent's output into its buffers for <DATA>. The second would be to change next unless /\s/ to next unless /\s$/ so that a burst of characters that contains a space followed by the start of the password won't be printed before the password can be removed.

        One would need to add some synchronizing code so the parent process can wait until first sysread is in progress. That way Perl won't eat any of the parent's output into its buffers for <DATA>.

        That doesn't appear to be needed, but my tests haven't been extensive.

        The second would be to change next unless /\s/ to next unless /\s$/

        Oops, you're right that there's a problem with my solution. However, your fix is no good (If it receives "abc def ghi", it should output "abc def " and only leave "ghi" in the buffer). At this point, I'm not going to fix it. Just stick to your getc solution.