in reply to 'Restricted' data, an additional security mechanism for Perl.
For example, customer credit card numbers should only be passed to the card-processing facility, while your SQL DBMS connection password should only be passed to the database. They should both be marked as "restricted", but you don't want to let them cross-contaminate. (Unless you explicitly enable that.)
Instead of a flag or an incremental level, something along the lines of categories might be more useful -- this data is restricted as "customer private" while that other data is marked as "server internals", and only channels/objects which are approved for that category can access them.
More generally, you might want to look at Capabilities for a more sophisticated way of building an access control model.
|
---|
Replies are listed 'Best First'. | |
---|---|
Restricted' data, a clarification
by jarich (Curate) on Feb 12, 2004 at 02:54 UTC | |
Read more... (1312 Bytes)
What this gives us is the ability to specifically choose where our data can go. We still have to make sure that we correctly filter stuff (just like we shouldn't use the regexp /(.*)/ in taint checking) but it helps us be just that little bit more sure that we're not going to make stupid mistakes and send out private date to the wrong person/process.
So, does anyone other than pjf and I think this would be worth while? Update: changed the title | [reply] [d/l] [select] |
by flyingmoose (Priest) on Feb 12, 2004 at 04:34 UTC | |
I'll still going to say I don't understand the motivation, but this makes your goals more clear. I don't have a vested interest in fighting this one, but I am enjoying playing devil's advocate... So, if the goal is to keeping newbie programmers from using a variable without first running a subroutine on the variable, what is to say they can't use a weak subroutine to "clean" a variable?
Also, what is to prevent someone from just passing around the gateway variable as a "key" within the code?
I could be missing some of the finer points, but essentially my point is "a lock is no good if the key is under the door mat". It seems the key is under the door mat. If not, I'm darn sure I could get that key fairly easily, and that by definition of the facilities employed in implementing this, isn't not really a security measure at all. Now some languages have private variables, and this is marginally useful if your fear is someone printing a password. It seems we would go further by trying to find a way to write private variables that can only be read from certain packages. However, this not say some one can't modify the original code (or perform other exploits) to defeat this "security". So why don't we trust the code we are running? Is this part of a plugin architecture where arbitrary users can upload code? If not, don't allow that. Otherwise, isolate your credit card and password items into modules no folks have value tampering with. If you really can't trust your fellow coders on a project, you might (possibly) be able to write FETCH/STORE kind of wrappers that deny access from outside the package. (This is theory, I don't know...). Essentially security should deal with external sources getting at data -- other users, other programs, networked or not. When you can't trust your own code, that's sandboxing, and is a different problem. Calling this security, at least in my eyes, gives us a false illusion of being secure. This is just a very small piece...helping to know that you have not handled data loosely throughout your app. For one who is teaching security, start with the basics. Network security. Open ports. Packet sniffers. Plaintext data. Encoding is not Encryption. Injections. SSL without keys and key exchange. DOS vulnerabilities. Cross-site scripting and SQL vulnerabilities. Changing HTML forms to alter important fields. Spoofing and IP games (arp). Man-in-the-middle. Local security. Permissions. Uploading and executing code to gain local access. Only once that you have stopped all-of-the-above is the "restricted code" module really important. All of the others have higher gains and are more likely to be exploited by 'evil'. I'm not a security expert by any grounds, but where I work I've seen and fixed numerous holes in our mystery app (FYI -- it's not Perl) since I'm one of few that has interest in finding/closing them. The most obscene was encoding passwords BASE64 (plaintext-equivalent) and leaving the file permissions as 655! Local socket exploits (non-root user being able to connect and manipulate root daemon) were also found. We also used to use a lot of plaintext network traffic. It's a big deal, and you've got to look everywhere to clean up what most folks don't know to think about. This is all white-hat easy stuff too... I'm sure it can get a lot more evil/complicated if someone really wanted in to our app. In conclusion, we aren't even close to secure now...but it's getting better. | [reply] [d/l] [select] |
by jarich (Curate) on Feb 12, 2004 at 05:16 UTC | |
So, if the goal is to keeping newbie programmers from using a variable without first running a subroutine on the variable, what is to say they can't use a weak subroutine to "clean" a variable? Exactly the same thing that stops a newbie programmer from using the regular expression /(.*)/ to untaint their variables. Nothing. Except if you saw that kind of untainting going on, you'd be even less willing to trust their code. This idea isn't about protecting newbies from themselves. It's not about trusting fellow programmers or not trusting them. It's not about preventing someone from editing your code or passing around the "gateway variable" (although I don't quite understand what you meant by that). There's no point in building in "security" like this into scripts that other people can edit. Particularly since all of this could be turned off by commenting out or deleting whatever turned it on. This idea is about assisting experienced, professional coders in their coding. It's about making sure that we don't do stupid things like print out a variable, made up of data that should be restricted, to a log file because it's 3am in the morning and the coffee's worn off. Yes the key is under the mat. It's under the mat in taint checking too. A perfect coder shouldn't ever need to turn on taint checking because that coder would always clean variables coming in from tainted sources. However, that's no excuse for not turning on taint checking anyway. It doesn't cost a lot and is important because that perfect coder's code might be edited by me sometime in the future. I'm not perfect.
I can trust my code to have bugs in it. I can trust that some of my code will grow past 2000 lines long and that I may not be the sole maintainer of it. I can't always trust that this innocuously named variable $string hasn't collected some restricted data along the way that I shouldn't be printing out to STDERR. It would be nice to have something assist me by keeping track of that information at a lower level, and dying or filtering the string when I attempt to print it to a log file I can't easily truncate. This isn't sandboxing in the sense that I understand sandboxing. Does this help explain my reasoning behind our suggestion? jarich | [reply] [d/l] [select] |
by Abigail-II (Bishop) on Feb 12, 2004 at 09:23 UTC | |
I could be missing some of the finer points, but essentially my point is "a lock is no good if the key is under the door mat". It seems the key is under the door mat.It's not a lock. It's not intended to prevent malice. A programmer could always remove the 'restrict' calls, or add calls as to allow sensitive data to go out on evil channels. Tainting isn't a lock either. "use strict;" isn't a lock. Nor is "use warnings;". They are like safety belts. Those aren't locks either. They help you prevent doing damage. Abigail | [reply] |
by flyingmoose (Priest) on Feb 12, 2004 at 17:51 UTC | |
by dragonchild (Archbishop) on Feb 12, 2004 at 11:42 UTC | |
------
Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified. | [reply] |
by mr_mischief (Monsignor) on Feb 12, 2004 at 13:33 UTC | |
I don't agree with putting this kind of thing in the core. If your code had "use Data::Restrict;" or some similar module invocation near the top, that'd be fine by me. One way this could be made a module is to make that module override all the output functions, which is what I had said already. The data structures I used, although ugly, get the job done. Your restrict function, as part of a module, could be just a way to set values in such a data structure. The new versions of the output functions in the same module would use that data structure. Your example seems to have a weakness I and others have already pointed out -- restricting the printing of one variable at a time does not prevent assigning the value of it to another variable, then printing that. You could carry magic around in the language for every variable, but that would likely be bad for the common case. Since what is being proposed is sort of like a SuperTaint -- "don't even let this variable be output until cleaned or pointed at a certain output path" -- then perhaps it could be worked into the core to use the same Taint flag and just add code to the path when a restrict option is passed to the interpreter. I still don't like that. It's bulky, clumsy, and the porters have enough work to shoulder now. The smart thing to do from a security standpoint is always to deny by default and implicitly allow what is needed. This is the same when one is protecting oneself from oneself as when protecting oneself from strangers. I've shown code which does that. I've explained ways to further the protection, as by using Safe to disable the core's output routines except inside the module handling this. Using Safe, in fact, allows one to prevent variables from being in a scope where they are not wanted. Anything inside a Safe compartment has to be explicitly handed a variable in order to be able to get to that variable's value. By making the code for your program modular to the point that each fundamentally different operation can be in a separate compartment,one can only share those variables which are sensitive with compartments which need them. Any compartments which don't need to do output can be left unable to do so. Any compartments which need to do output but which don't need access to the sensitive variables can be part of a namespace that can't reach those variables. This part is all accomplished just by good use of Safe.pm. In addition, a restriction on printing variables other than those explicitly allowed can be helpful I guess, but not all that necessary, as proper use of Safe keeps the scope of the variable very small, keeps the areas of the program which can do output fairly small, and keeps the parts where the scopes of the sensitive variables and the ability to do output overlap only where absolutely needed. To debug those parts then becomes much simpler. Christopher E. Stith | [reply] |
by mattr (Curate) on Feb 13, 2004 at 17:27 UTC | |
If each restriction class (i.e. CreditCard, CreditCardGatewy, PinNumber, Password) is made equivalent to a unique bit flag, you could add them together to produce a "security level bit vector". This might force more than one kind of cleaning to be done, or might restrict more kinds of usage (i.e. is allowed in a certain db field, is not allowed in a cookie, output anywhere triggers a logged alert). This could be extended to objects, and maybe using @ISA to hold the different security classes would lend itself to describing the concatenation of restriction types. If extended to only cleaning certain fields of an object (or keys of a hash) you could i.e. keep $o->{MothersMaidenName} while cleaning out $o->{PinNumber}.
It could scan symbol table for variables and hash keys which match certain built-in and customizable regexes, like /sec|pass|pwd|pin|salt/i, so you can enjoy reduced stress by maintaining the practice of naming sensitive variables a certain way, as in Hungarian notation (not used much in Perl though).
Some effects I could imagine that would be useful:
Maybe it would work like this. (Pardon I am saying 'secure' this, not 'restrict' this):
| [reply] [d/l] |
by DrHyde (Prior) on Feb 12, 2004 at 13:26 UTC | |
| [reply] |
by bl0rf (Pilgrim) on Feb 13, 2004 at 00:41 UTC | |
I think that the confusion about the issue of restricted data stems from it being a solution to a problem we haven't seen. You should define the cases where this would be neccessary, perhaps with some examples. I, personally, have never encountered a situation such as sensitive data leaking out etc. and so restricted data seems superficial to me. I am sure that there are others with my opinion and I urge some of the more experienced monks to clearly explain the situations in which we would need restricted data...
The guy with the cheap signature. My site | [reply] |
by jarich (Curate) on Feb 13, 2004 at 03:00 UTC | |
I think that the confusion about the issue of restricted data stems from it being a solution to a problem we haven't seen. I had the exact same reaction when pjf brought it up. "Sure, it might be useful", I said, "but I wouldn't use it, I almost never write code which deals with credit card numbers or passwords or private client data. And it's not that hard to manage when I do." Most of my code is really, really boring and so 99% of the time I wouldn't even think twice about dumping all of my structures to STDERR for debugging purposes. However as we talked more about how things can go wrong by accident I remembered how, once upon a time, some code I had written, and tested before giving to someone else to test, got into production and caused havoc. It wasn't bad code, it was clean, and used strict and passed all of our tests, it passed code review, it looked secure. However it also had an interesting (and quiet!) failure mode which resulted, in the end, in about 100 staff member's username and passwords somehow appearing in the get strings and therefore in our access logs. Of course we raced the patch into production and that stopped happening rather quickly. But purging the access logs was fraught with political problems and keeping the passwords in there was unacceptable. Calling and explaining the error and asking each staff member to change their password was... unpleasant. Now maybe, if I had had the choice, I wouldn't have used restricted data in that case anyway, but it would have been the last time I didn't... I still wouldn't necessarily use restricted data in most of my work. As I said, the code for my job is boring. I don't use taint checking all the time either, but I do when I'm taking input from the user. I would definately consider restricting data when I thought I was dealing with something that shouldn't appear on STDOUT or STDERR, in my log file, in the database or what I print to a browser. The classes of things that fit these kind of restrictions isn't as small as you might initially think. If the market research firms would find the data valuable (see this calculator), then, as professionals, we have a moral obligation to ensure that it is not ending up in places that it shouldn't. It is not acceptable to print out someone's credit history to any kind of logfile -- that's not responsible data management. The same with credit card numbers, or even silent phone numbers. Maybe it's best to assume that all phone numbers are silent. Perhaps you think this approach to data management is a little draconian. After all, what's wrong with printing out names, addresses and (possibly silent) phone numbers into log files occassionally? Who's going to know? Who's going to have access to that information that can't access the database itself with a little work? How much do I have to worry about accidental exposure? The answers to those questions depend on your local privacy laws, your personal ethics and the contract with your client. The purpose of this idea isn't to help you decide what data should be restricted, just to help you make sure that you restrict it properly.
Do you mean that you've never printed something to a file that shouldn't have gone in there? Or seen stuff that someone else has? Do you mean that 100% of the time the stuff that you print out to a client browser is either innocuous (for some definition of innocuous) or what you intended to be printed? Even when you've used Data::Dumper? If you do mean this I am extemely envious. I make mistakes like this all the time. Especially when using Data::Dumper on code that I have either never seen before or haven't seen for 6 months or more. Data restrictions would help here, even though they wouldn't solve the underlying problems. I hope this helps at least a bit. jarich | [reply] |
by pjf (Curate) on Feb 15, 2004 at 01:30 UTC | |
You should define the cases where this would be neccessary, perhaps with some examples. Much of this thought has come up because of work that I'm performing for a client. The client has a very large and rather complex application. I'm changing core parts of that application to improve the authentication and session management implementation, as well as a number of speed improvements, one of which includes caching. One particular problem I've encountered is that the application receives a username and password, and then uses this combination to authenticate to a number of different services, snarfs up the relevant information returned by those services, and returns them to the user. As such, the username and password get passed around a lot. This is not my design, this is how the code was written before I arrived on the scene. I'm trying to move away from it, but that's a longer project. Because passing around the password is so prevalent, I feel extremely uneasy. The code is so large that I'm unable to easily trace everywhere that it gets used. Something may have the password in a die or warn that simply hasn't been triggered yet. Something may pass it as an argument or in the environment to another program, which is a common mistake, but exposes the information to anyone using ps. Other things could be happening, but without auditing a very large amount of code, I simply cannot tell. In this instance, I would be extremely pleased to be able to mark the password as restricted when it comes in. When the application then dies with an attempt to use restricted data, I can exaxmine the relevant section of code, ensure it's doing the right thing, and tweak it accordingly. I'd also feel a little bit better about there being a level of future-proofing -- I'd much rather have the logger throw a restricted exception than inadventantly log the password to a file due to poor coding somewhere in the application. The idea of restricted data would have also been useful when debugging the caching mechanism this application is using. The user's password should not be cached, and while I explicitly remove the password before dropping the data structure into the cache, it ended up that two or three other data structures also stored the password, and these were being cached. They had to be tracked down and fixed manually. What's to say that there aren't any more hiding in the program? Restricted data doesn't solve all the problems I've described above, but it does provide a helmet and safety belt which can help reduce the damage. Cheers,
Paul Fenwick Perl Training Australia | [reply] |
by Abigail-II (Bishop) on Feb 12, 2004 at 09:24 UTC | |
Abigail | [reply] |
by jarich (Curate) on Feb 13, 2004 at 00:00 UTC | |
Since, by and large, we've had both interested and supportive feedback I guess we now have a good reason to give it a go. The suggested interface in the grandparent of this node was merely a suggestion, it's very possible that the end interface will look very little like it. Thanks for your support. We'll keep you posted on how it goes. | [reply] |
by jacques (Priest) on Feb 12, 2004 at 23:29 UTC | |
1. In a long project, how would I figure out which data is restricted? If the data is restricted 2,000 lines above where I tried to use it, short of an egg hunt, how would I know? (Seeing if the program dies when trying to print is an unacceptable solution.) 2. If I unrestrict the data for an operation, do I need to restrict it again? What if I forget, since I might have dozens of restricted variables? One omission could be disastrous. 3. And would people become more careless, thinking they are safe with this proposed option? (Read #2) | [reply] |
by jarich (Curate) on Feb 13, 2004 at 01:30 UTC | |
How do you currently find out what data is tainted? Taintedness spreads the same way as I expect restrictedness would, that is any operation (including assignment) involving restricted data ends up with the result being restricted too. At the most simple you could do this: just as you can write a very similar one to detect tainting. This subroutine catches the error if there isn't a filter and otherwise sees if the filter performed any changes. It doesn't catch the case where your filter is useless however. If you think that knowing if something is restricted is important we can probably write something in for checking it. 2. Unrestricting/Re-restricting How would you unrestrict the data for an operation? The suggested filter idea I provided only "unrestricts" the data on its way out. Every operation using restricted data results in a restricted output, just the same as every operation (excepting capturing from a regular expression) on tainted data results in a tainted output. As I'm uncertain as to what motivation there would be in unrestricting data internal to the program, the issue of restricting it again doesn't come up. 3. Encouraging carelessness If using taint checking encourages programmers to become more careless - because the taint checking will rescue them, or if using strict and warnings encourage carelessness then this certainly will. Even if those things don't cause carelessness in certain programmers this might. I'm inclined to generalise programmers who might use this feature into two groups relevant to this discussion. You have the good programmers who know that printing out passwords, private information and credit card numbers to logs is a bad idea. Hopefully they'll see this as just another tool (like strict and taint) to assist them not to make mistakes, not a magic panacea. They'll code in a clean and reasonable manner which doesn't rely on strict and taint and data restrictions but uses them just in case. The other group are the programmers wanting to get things done now! but have heard that using data restrictions is the thing to do. These programmers may be lead towards greater carelessness because they may choose to rely on data restrictions rather than ensuring that their code is clean and tidy regardless. Unfortunately these are the same programmers who'd probably create dud filters or be prolific in granting output permissions if their programs were being constantly tripped up by these restrictions. That is, if they didn't decide to dump the whole restricted data thing first. There are lots of programmers in both camps. Fortunately most CPAN modules (for example) are written by people from the first group. :) And Perlmonks does a great job of helping move up from the second group. I hope this answers some of your questions. jarich | [reply] [d/l] |
|