Filtering passwords from the output of a script

quester has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Filtering passwords from the output of a script by ikegami (Patriarch) on Nov 29, 2006 at 04:50 UTC
I don't understand why you run two copies of Perl. What about just `... \| perl -pe 'BEGIN { $\|=1; } s/\Qhide_me/removed/ig' \| tee -a test. +log` [download] or `... \| censor_passwd \| tee -a test.log` [download] `#!/usr/bin/perl # censor_passwd use warnings; use strict; my $password = "hide_me"; $\| = 1; while (<STDIN>) { s/\Q$password/removed/ig; print; }` [download] Update: By the way, this tool can be used to help guess the password. For example, feeding "abcdef" to this script checks the following passwords all at once: a ab abc abcd abcde abcdef b bc bcd bcde bcdef c cd cde cdef d de def e ef f	[reply] [d/l] [select]
Re^2: Filtering passwords from the output of a script by graff (Chancellor) on Nov 29, 2006 at 04:56 UTC
Your first alternative would make the password string available to all logged-in users via the "ps" command. Your second alternative will still be stuck with line buffering -- even with $\| set to non-zero everywhere on every file handle, the standard print function and diamond read operators will operate only in line-buffered mode ($\| merely makes sure that consecutive lines are not buffered into chunks of 4KB or 8KB or whatever the pipeline buffer size happens to be).	[reply]
Re^3: Filtering passwords from the output of a script by ikegami (Patriarch) on Nov 29, 2006 at 05:07 UTC
Your first alternative would make the password string available to all logged-in users via the "ps" command. Yeah, that's why I put the second alternative. I should have mentioned it. Your second alternative will still be stuck with line buffering Oh! right! Here's a version that doesn't wait for a whole line to be read before #!/usr/bin/perl # censor_passwd use warnings; use strict; my $passwd = 'hide_me'; my $replacement = 'removed'; # Used to check if the start of the password # is at the end of the read data. The returned # regexp will match 0 characters if not. # For efficiency, it expects the data to be # "reverse"d. my $partial_re = do { my $r = reduce { "(?:$a$b)?" } '', reverse map quotemeta, map /(.)/g, $passwd; qr/$r/ }; binmode(STDIN); binmode(STDOUT); $\| = 1; my $buf = ''; for (;;) { my $rv = sysread(STDIN, $buf, 4096, length($buf)); defined $rv or die("Unable to read from STDIN: $!\n"); $rv or last; for (;;) { my $pos = index($buf, $passwd); last if $pos < 0; substr($buf, $pos, length($passwd), $replacement); print(substr($buf, 0, $pos+length($replacement), '')); } reverse($buf) =~ /^$partial_re/; print(substr($buf, 0, length($buf)-$+[0], '')); } print($buf); [download] Points of interest: It performs minimal input buffering. It outputs everything as soon as it receives it, without waiting for lines or words to be completed. It only buffers when the data that was read ends with the start of the password. It is very efficient. It reads a block of data at a time, if more than a byte of data is available. It avoids using `/...\z/` and even slower `/^.*?.../` by using `reverse`. It avoids using captures. A major loop was replaced with a precompiled regexp. ( ~~Caveat: What if "hide" and "_me" are received seperatly?~~ Code updated to address this issue. )	[reply] [d/l] [select]
Re^4: Filtering passwords from the output of a script by graff (Chancellor) on Nov 29, 2006 at 05:37 UTC
Re^5: Filtering passwords from the output of a script by ikegami (Patriarch) on Nov 29, 2006 at 05:43 UTC
Re: Filtering passwords from the output of a script by graff (Chancellor) on Nov 29, 2006 at 05:17 UTC
This is not supposed to be obfuscated... I understand, and the OP code (once I grokked it) actually struck me as more clever than obfuscated. Nice, even. But if you really want completely unbuffered output to the log file and anywhere else, you have to use syswrite and sysread, and you have to use those exclusively. The standard i/o methods (i.e. print and `<FILEHANDLE>`) are intrinsically line-oriented -- Well, actually, print and the diamond operator are record oriented, where the record delimiter is defined by the globals $/ and $\. So you could try playing with those -- but that won't be any less obfuscative than just using sysread and syswrite. I had to play with it a bit, but the following version of the OP code does what I think you want in terms of making sure that output passes through the pipe and into the log file with the shortest possible buffering delay. (Note that you cannot avoid doing word-level buffering, because without that, you wouldn't be able to use s/// to remove the password string. Also, a lot of stuff in the "internal" script needs to be escaped in order to work right -- I was actually puzzled that "\s" had to be "\\s" in the "if" clause, but doing "\\Q" and "\\E" was apparently unnecessary.) use warnings; use strict; my $password = "hide_me"; my $log_file = "test.log"; open STDOUT, "\| perl \| tee -a $log_file"; my $script = <<_END_FILTER_SCRIPT_; use warnings; use strict; my \$chr; \$_ = ''; while (sysread DATA, \$chr, 1) { \$_ .= \$chr; if ( /\\s\$/ ) { s/\Q$password\E/removed/ig; syswrite STDOUT, \$_; \$_ = ''; } } close STDOUT; __DATA__ _END_FILTER_SCRIPT_ syswrite STDOUT, $script; sleep 1; ## had to add this delay so next syswrite would always work syswrite STDOUT, "The password is $password\n"; foreach (qw{Output is unbuffered if these words are printed one at a t +ime.}) { syswrite STDOUT, $_." "; sleep 1; } syswrite STDOUT, "\n"; sleep 1; syswrite STDOUT, "Output is line buffered if the last line came out al +l at once.\n"; close STDOUT; [download] I agree that having to stuff all those backslashes into the internal script is really ugly, so maybe you'll want to externalize that part of the code into a separate script file, as suggested in the first reply. (But that means storing your password string in yet another file, so as not to expose it in the system's process table.) (update: removed unnecessary "binmode" calls)	[reply] [d/l] [select]
Re^2: Filtering passwords from the output of a script by quester (Vicar) on Nov 29, 2006 at 07:48 UTC
graff and ikegami, Thanks++ to both of you. After getting a lot of inspiration from all your posts I tracked the problem down to the diamond operator, "<DATA>". A working, although possibly nonportable, version is at the bottom of this post. In my current environment (Linux 2.6, Perl 5.8) it isn't necessary to use syswrite; ordinary print statements work fine after setting autoflush ($\| = 1.) This is very fortunate because I would like to stay away from modifying the guts of Perl Expect. The sleep statement that graff put in actually works like this: Perl is already trying to do buffered input on DATA in the inner script, and it tries to grab some of the input from the pipe. It hangs on to it when sysread is called. (That baffled me for a while, but putting a "print <DATA>" at the end of the script printed the missing first lines of input.) When I changed the sysread to read it worked reliably... on my current platform. A simple getc also works... again, on my current platform. If anyone can explain the real difference between Perl's sysread and read I'm all ears. What I found refers me to the documentation on the Unix functions read and fread, which look almost the same. Is using read or getc in this script likely to break if it's ported to a different operating system? If I do need to use sysread to avoid portability issues, I'm very much loath to put a sleep into production code. I tried select undef, undef, undef, 0.0001; on my laptop and it was long enough sometimes and not others. Ten microseconds was never enough and one millisecond was always enough... tonight, with nothing else running. My experience has been that the length of the sleep needed will vary by enormous factors depending on the details of the system load. I suppose that it if it was really necessary a file could be used as an "I am ready now" flag between the two processes. Thanks again for the suggestions about too many backslashes (I changed from a here document to single quotes) and the trick of waiting for whitespace to check for a password that arrives in pieces - I had missed that one. Sorry if I'm getting long winded. The big question: is getc a good way of doing this or is it nonportable? The current version of the code (with getc) looks like this. use warnings; use strict; my $password = "hide_me"; my $log_file = "test.log"; open STDOUT, "\| perl \| tee -a $log_file"; select STDOUT; $\| = 1; print ' use warnings; use strict; select STDOUT; $\| = 1; my $chr; $_ = ""; while ($chr = getc DATA) { $_ .= $chr; next unless $chr =~ /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ '; print "The password is $password\n"; foreach (qw{Output is unbuffered if these words are printed one at a t +ime.}) { print $_, " "; select undef, undef, undef, 0.3; } print "\n"; foreach (split //, "Even one character at a time: $password should be +filtered.\n") { print; select undef, undef, undef, 0.1; } close STDOUT; [download] The output looks like this. The second and third lines are both printed a word at a time. `The password is removed Output is unbuffered if these words are printed one at a time. Even one character at a time: removed should be filtered.` [download]	[reply] [d/l] [select]
Re^3: Filtering passwords from the output of a script by ikegami (Patriarch) on Nov 29, 2006 at 08:28 UTC
The `read` and `getc` Perl functions correspond to the `fread` and `fgetc` C-lib functions. The C-lib I/O functions are buffered. When you ask for X bytes, it might actually read X+Y bytes internally. Subsequent reads will read from the Y bytes first. This is very good in your case, because you keeps asking for one byte. Most of the time, you'll just be reading from the buffer, which is faster than doing a real read. How do the C-lib functions get their data? Through system calls. `sysread` is Perl's interface to the `read` system call. The system I/O functions are not buffered. A key difference between `read` and `sysread` is that `read(FH, $buf, $bytes)` will wait for `$bytes` bytes to be available, whereas `sysread(FH, $buf, $bytes)` will return as soon as bytes become available. `$bytes` is simply a maximum for `sysread`. I took advantage of this to read in more than one byte at a time. C-lib and system functions should not both be used on the same file handle. Your usage of `getc` is incorrect. It returns `undef` at the end of input, not false. The `select STDOUT;` is useless. `print ' use warnings; use strict; $\| = 1; $_ = ""; while (defined($chr = getc DATA)) { $_ .= $chr; next unless $chr =~ /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ ';` [download] (Update: Nevermind, `next unless /\s/;` is buggy. Probably faster: `print ' use warnings; use strict; $\| = 1; $_ = ""; for (;;) { sysread(DATA, $_, 4096, length()) or last; next unless /\s/; s/' . (quotemeta $password) . '/removed/ig; print; $_ = ""; } print; __DATA__ ';` ~~[download]~~ )	[reply] [d/l] [select]
Re^4: Filtering passwords from the output of a script by quester (Vicar) on Nov 29, 2006 at 08:41 UTC
Re^5: Filtering passwords from the output of a script by ikegami (Patriarch) on Nov 29, 2006 at 17:56 UTC
Some notes below your chosen depth have not been shown here
Re^2: Filtering passwords from the output of a script by ikegami (Patriarch) on Nov 29, 2006 at 06:38 UTC
Three issues: It is word-buffered. It will only output when whitespace is received. It won't always work if the password has a space in it. I imagine that reading a byte at a time (when more than a byte is available) is much slower than reading a chunk of bytes. See Re^3: Filtering passwords from the output of a script. By the way, `binmode` might be required. I remember having problems when attempting to read characters (as opposed to bytes) using `sysread`, but the details escape me.	[reply] [d/l] [select]
Re^3: Filtering passwords from the output of a script by quester (Vicar) on Nov 29, 2006 at 08:02 UTC
Good points, although /\s/ actually also matches newline and carriage return. As a sheer stroke of fortune, in my current application a restriction against using spaces or non-ASCII characters in passwords is appropriate. Also, I don't have to worry too much about efficiency, my logs only arrive at a few hundred bytes per second typically. Thanks again!	[reply]