in reply to Re^2: Filtering passwords from the output of a script
in thread Filtering passwords from the output of a script

The read and getc Perl functions correspond to the fread and fgetc C-lib functions. The C-lib I/O functions are buffered. When you ask for X bytes, it might actually read X+Y bytes internally. Subsequent reads will read from the Y bytes first.

This is very good in your case, because you keeps asking for one byte. Most of the time, you'll just be reading from the buffer, which is faster than doing a real read.

How do the C-lib functions get their data? Through system calls. sysread is Perl's interface to the read system call. The system I/O functions are not buffered.

A key difference between read and sysread is that read(FH, $buf, $bytes) will wait for $bytes bytes to be available, whereas sysread(FH, $buf, $bytes) will return as soon as bytes become available. $bytes is simply a maximum for sysread. I took advantage of this to read in more than one byte at a time.

C-lib and system functions should not both be used on the same file handle.


Your usage of getc is incorrect. It returns undef at the end of input, not false.

The select STDOUT; is useless.

print ' use warnings; use strict; $| = 1; $_ = ""; while (defined($chr = getc DATA)) { $_ .= $chr; next unless $chr =~ /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ ';

(Update: Nevermind, next unless /\s/; is buggy.

Probably faster:

print ' use warnings; use strict; $| = 1; $_ = ""; for (;;) { sysread(DATA, $_, 4096, length()) or last; next unless /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ ';

)

Replies are listed 'Best First'.
Re^4: Filtering passwords from the output of a script
by quester (Vicar) on Nov 29, 2006 at 08:41 UTC
    Thanks again. That's a good point about changing while ($chr...) to while (defined ($chr...)). Without it, a null character will cause the logging subprocess to exit, which would be very sad.

    Two minor notes about the sysread code: one would be to add some synchronizing code so the parent process can wait until first sysread is in progress. That way Perl won't eat any of the parent's output into its buffers for <DATA>. The second would be to change next unless /\s/ to next unless /\s$/ so that a burst of characters that contains a space followed by the start of the password won't be printed before the password can be removed.

      One would need to add some synchronizing code so the parent process can wait until first sysread is in progress. That way Perl won't eat any of the parent's output into its buffers for <DATA>.

      That doesn't appear to be needed, but my tests haven't been extensive.

      The second would be to change next unless /\s/ to next unless /\s$/

      Oops, you're right that there's a problem with my solution. However, your fix is no good (If it receives "abc def ghi", it should output "abc def " and only leave "ghi" in the buffer). At this point, I'm not going to fix it. Just stick to your getc solution.

        And thank you again, ikegami.

        The need for the synchonizing code with sysread came from graff's observation that he needed to add a sleep 1; the first line or so of output gets lost unless the main script waits a bit before it sends anything.

        I hadn't thought about spaces embedded in the password, but it would only be a problem if the script read more than one byte at a time. Using getc, $_ will never contain embedded whitespace, because it would have been found on a previous pass through the loop.

        Last night you pointed out that checking for /\s/ (or /\s$/) doesn't work if the password contains spaces. After I though about it for a while, I think it can be fixed by changing

        next unless /\s/;
        to see if $_ is the beginning of the password
        next if $_ eq substr "' . (quotemeta $password) . '", 0, length;
        when reading one character at a time.

        If it was really necessary to read more than one character at a time and the passwords would contain spaces, it would probably work to check if any trailing substring of the buffer matches any leading substring of the password like this:

        [[... beginning of the code as above...]] print ' use warnings; use strict; use List::Util qw(min); select STDOUT; $| = 1; my $chr; $_ = ""; CHAR: while (defined ($chr = getc DATA)) { $_ .= $chr; foreach my $n (1..min ' . (length $password) . ', length) { next CHAR if substr ("' . (quotemeta $password) . '", 0, $n) eq substr ($_, (length)-$n, $n); } s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ '; [[... the rest of the code is as above...]]
        ... but I think I prefer the simplicity of reading one character at a time.

        Take care,

        quester