Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: Restricted' data, a clarification

by bl0rf (Pilgrim)
on Feb 13, 2004 at 00:41 UTC ( [id://328700] : note . print w/replies, xml ) Need Help??

in reply to Restricted' data, a clarification
in thread 'Restricted' data, an additional security mechanism for Perl.

Dear jarich and pjf,
I think that the confusion about the issue of restricted data stems from it being a solution to a problem we haven't seen. You should define the cases where this would be neccessary, perhaps with some examples. I, personally, have never encountered a situation such as sensitive data leaking out etc. and so restricted data seems superficial to me.

I am sure that there are others with my opinion and I urge some of the more experienced monks to clearly explain the situations in which we would need restricted data...

The guy with the cheap signature.
My site

Replies are listed 'Best First'.
Why should I restrict my data?
by jarich (Curate) on Feb 13, 2004 at 03:00 UTC
    I think that the confusion about the issue of restricted data stems from it being a solution to a problem we haven't seen.

    I had the exact same reaction when pjf brought it up. "Sure, it might be useful", I said, "but I wouldn't use it, I almost never write code which deals with credit card numbers or passwords or private client data. And it's not that hard to manage when I do." Most of my code is really, really boring and so 99% of the time I wouldn't even think twice about dumping all of my structures to STDERR for debugging purposes.

    However as we talked more about how things can go wrong by accident I remembered how, once upon a time, some code I had written, and tested before giving to someone else to test, got into production and caused havoc. It wasn't bad code, it was clean, and used strict and passed all of our tests, it passed code review, it looked secure. However it also had an interesting (and quiet!) failure mode which resulted, in the end, in about 100 staff member's username and passwords somehow appearing in the get strings and therefore in our access logs.

    Of course we raced the patch into production and that stopped happening rather quickly. But purging the access logs was fraught with political problems and keeping the passwords in there was unacceptable. Calling and explaining the error and asking each staff member to change their password was... unpleasant.

    Now maybe, if I had had the choice, I wouldn't have used restricted data in that case anyway, but it would have been the last time I didn't...

    I still wouldn't necessarily use restricted data in most of my work. As I said, the code for my job is boring. I don't use taint checking all the time either, but I do when I'm taking input from the user. I would definately consider restricting data when I thought I was dealing with something that shouldn't appear on STDOUT or STDERR, in my log file, in the database or what I print to a browser.

    The classes of things that fit these kind of restrictions isn't as small as you might initially think. If the market research firms would find the data valuable (see this calculator), then, as professionals, we have a moral obligation to ensure that it is not ending up in places that it shouldn't. It is not acceptable to print out someone's credit history to any kind of logfile -- that's not responsible data management. The same with credit card numbers, or even silent phone numbers. Maybe it's best to assume that all phone numbers are silent.

    Perhaps you think this approach to data management is a little draconian. After all, what's wrong with printing out names, addresses and (possibly silent) phone numbers into log files occassionally? Who's going to know? Who's going to have access to that information that can't access the database itself with a little work? How much do I have to worry about accidental exposure?

    The answers to those questions depend on your local privacy laws, your personal ethics and the contract with your client. The purpose of this idea isn't to help you decide what data should be restricted, just to help you make sure that you restrict it properly.

    I, personally, have never encountered a situation such as sensitive data leaking out etc. and so restricted data seems superficial to me.

    Do you mean that you've never printed something to a file that shouldn't have gone in there? Or seen stuff that someone else has? Do you mean that 100% of the time the stuff that you print out to a client browser is either innocuous (for some definition of innocuous) or what you intended to be printed? Even when you've used Data::Dumper? If you do mean this I am extemely envious. I make mistakes like this all the time. Especially when using Data::Dumper on code that I have either never seen before or haven't seen for 6 months or more. Data restrictions would help here, even though they wouldn't solve the underlying problems.

    I hope this helps at least a bit.


A real world use for restricted data
by pjf (Curate) on Feb 15, 2004 at 01:30 UTC
    You should define the cases where this would be neccessary, perhaps with some examples.

    Much of this thought has come up because of work that I'm performing for a client. The client has a very large and rather complex application. I'm changing core parts of that application to improve the authentication and session management implementation, as well as a number of speed improvements, one of which includes caching.

    One particular problem I've encountered is that the application receives a username and password, and then uses this combination to authenticate to a number of different services, snarfs up the relevant information returned by those services, and returns them to the user. As such, the username and password get passed around a lot. This is not my design, this is how the code was written before I arrived on the scene. I'm trying to move away from it, but that's a longer project.

    Because passing around the password is so prevalent, I feel extremely uneasy. The code is so large that I'm unable to easily trace everywhere that it gets used. Something may have the password in a die or warn that simply hasn't been triggered yet. Something may pass it as an argument or in the environment to another program, which is a common mistake, but exposes the information to anyone using ps. Other things could be happening, but without auditing a very large amount of code, I simply cannot tell.

    In this instance, I would be extremely pleased to be able to mark the password as restricted when it comes in. When the application then dies with an attempt to use restricted data, I can exaxmine the relevant section of code, ensure it's doing the right thing, and tweak it accordingly. I'd also feel a little bit better about there being a level of future-proofing -- I'd much rather have the logger throw a restricted exception than inadventantly log the password to a file due to poor coding somewhere in the application.

    The idea of restricted data would have also been useful when debugging the caching mechanism this application is using. The user's password should not be cached, and while I explicitly remove the password before dropping the data structure into the cache, it ended up that two or three other data structures also stored the password, and these were being cached. They had to be tracked down and fixed manually. What's to say that there aren't any more hiding in the program?

    Restricted data doesn't solve all the problems I've described above, but it does provide a helmet and safety belt which can help reduce the damage.