Once when I was six years old I saw a magnificent picture in a book, called True Stories from Nature, about the primeval forest. It was a picture of a boa constrictor in the act of swallowing an animal. ... In the book it said: "Boa constrictors swallow their prey whole, without chewing it. After that they are not able to move, and they sleep through the six months that they need for digestion." I pondered deeply, then, over the adventures of the jungle. And after some work with a colored pencil I succeeded in making my first drawing. My Drawing Number One. It looked something like this:I showed my masterpiece to the grown-ups, and asked them whether the drawing frightened them. But they answered: "Frighten? Why should any one be frightened by a hat?" My drawing was not a picture of a hat. It was a picture of a boa constrictor digesting an elephant. But since the grown-ups were not able to understand it,...xxxxxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxThe Little Prince, Chapter 1 by Antoine de Saint-Exupéry
I must be the snake the little prince was thinking of because I seem to be in the habit of swallowing elephants. I often find myself needing to learn a complex system inside and out very quickly. Sometimes, it is because I have a new client. Especially in my early days I would often get invited to lead the design on a half finished project after a series of the big guys (Accenture, etc) had successfully botched the project. Until then, companies don't really look beyond the obvious and safe sources of consultants. More recently, I have either needed to evaluate 3rd party software systems for in-house use or even our own code for refurbishment. And then there are volunteer projects.
Over the course of time I've evolved a strategy for working through complex systems rather quickly. I won't say it isn't a lot of work (it is), but it keeps me from going in endless circles, so that at least the work moves me forward.
I'd be interested in knowing more about how other monks handle going about learning a new system quickly. Not everyone learns the same way. Also if you see any non sequiturs or an obvious omission, I'd appreciate the feedback. Writing something like this up feels a bit like trying to explain how to tie a shoe. It is all to easy to take for granted a crucial step and leave it out. I also have an ulterior motive. pmdevils are also in the position of trying to swallow elephants when they join.
The ten steps can be summarized (they will be explained in more depth later below):
This is, of course, an interactive process and it often involves a lot of backtracking as well. Often the first pass through a step results in more questions than answers. Then I go onto another step, and find an answer to something I couldn't figure out in the previous step. If the answer is important enough, I may go back and redo all or part of the previous step.
Although the first step is "gather together documentation", the process describe below will work even if documentation is sketchy and you have to manually go through code and database schemas. Documentation makes the process much easier (if it is correct), but it isn't necessary to the learning process.
Figuring out how much detail to do at any one step of the process is a bit of an art. Usually I try to be as superficial as possible. For each stage I try to learn enough to categorize things in some sensible way that tells a story about data, behavior or the interaction between them. Then I move on to the next step. When the fog to understanding ratio gets out of hand, I backtrack one or two steps and go into more detail until the fog clears.
As I work through the above list I usually keep copious notes and organize them as I go. Writing out my answers to the questions at each step helps me identify unanswered questions. I also find I'm accumulating knowledge much faster than I can absorb it so the notes act as a memory bank, especially if they are well organized. I frequently organize and reorganize the notes as I go. The act of organizing them also helps me remember more. Finally, if the system is poorly documented, the notes become a first cut at improved documentation.
As one skims through the list, the first thing that might stand out is that a third of the steps are data centric, including the first detailed analysis step (study data streams and persistant data). For many programmers starting with data may seem counter intuitive. Programming is about making data do something, and code is where the action is. But data puts an upper bound on what the code can do so it provides a way of focusing attention and understanding the scope of the system. It is also the easiest discreate nameable thing to get a handle on. It acts like the end of a thread used to unravel a ball of knots.
Another thing that might stand out to the OO fans reading the list above is that there is no mention of roles or behaviors or all the other jargon of the OO world. I find that interesting because I've been doing OO since the late 80's and it is virtually impossible for me to design software that is more than 500 lines without ending up having at least a few classes. Even before C++ became popular I was organizing C functions associated with specific data structures into dedicated files and designing dispatch handlers for the data structures. It just seemed like the right way to do things.
I'm guessing the reason for this is that data is a long hand that reaches throughout a system. Or maybe it would be better described as blood and oxygen. Every aspect of the system needs it regardless of its role front end or back end.
Classes and objects are marvelous ways to organize code. They can also be a great way to get a user to cough up requirements. Users often have a hard time talking about data apart from the things they do with the data. However, code and end-user requirement gathering are only two of the many ways a system needs to be categorized, sliced and diced in order to understand it. On the front and we need to understand workflows. On the database we need to understand normalization or we won't really get the benefit of our database's SQL engine. Somewhere in between we need to understand aspects: large swaths of functionality that are content independent.
To fully understand a system it is also important to get a handle on how all of these different ways of categorizing data, code, and end user functionality relate to one another. The learning strategy elaborated below tries to help in that process. Whether it succeeds is for you to decide. But if it seems counter intuitive at least consider trying it the next time you need to learn a system quickly. You might be surprised at the results.
I apologize in advance for the list like nature of this elaboration. Partly I don't have the time to expand it fully right now. Also, I fear turning it into a narrative with examples would likely stretch this node to book length. Hopefully though I will have at least raised some questions and pointed out things to look for that might be helpful for others.
You may have documentation. You may not. It may be up to date. It may not.
Even out of date documentation or incorrect documentation can be helpful if it gives you a sense of the design philosophy or a road map through the code. My first step is to skim through the documentation and make some sort of assessment of what is there and how much I trust it. However, unless the documentation is amazingly clear and well written, when first learning a system I usually take the documentation with a grain of salt. I like to get down and dirty into the guts of things and see how it all works with my own eyes.
Even when I trust the documentation and it seems relatively complete I still use the remaining steps to as a road map through the documentation and check list for my understanding of the system.
From the user's point of view, what does the system do? Do this as a brief survey to get a "feel" for the application or system. The goal for this step is to provide context for the more detailed study of how the system is implemented.
For example, at Perl Monks all functionality is available via specially crafted URLs. However, one doesn't normally access features directly via URLs. Instead one either uses the link lists in the upper corner or one of the several nodelets found in the side bar. The upper right corner has links to the main functional areas (mostly), and the nodelets handle special purpose function areas.
Some systems use classes primarily to organize the system around "aspects" - areas of functionality that can be defined indepenent of content. Content-based type systems are handled based on data. They have no corresponding code classes.
For example, the Everything engine that runs the Perl Monks website uses classes to organize code into a database backend (inherits from DBI) and a front end request handler (inherits from CGI). It also has an elaborate system for assigning each design element and each node written by the user to a "nodetype". These node types can inherit from each other and they drive both the choice of applicable attributes and the available functionality. However, these node types have no corresponding classes in the code.
Note: this is almost always a bug waiting to happen unless the duplicated data is immutable when the code was written and always will be from now until eternity. Or alternatively the object is very transient (e.g. one report run or edit transaction).
It is *very* hard to keep that data in sync with the master copy. The code to do it often gets the edge cases wrong, creating subtle bugs. Also even if green field development got it right to start with, maintainers forget about this sort of denormalized data and usually fail to find all the places affected by a database change.)
If you are lucky a certain amount of categorization may have been done for you. If not, the only way to do this is to look at each file! Here are some things I do to speed up the process:
Do not expect to get things right at this stage. Some files will be miscategorized.
When categorizing focus on (a) files that belong to a specific aspect, e.g. retrieval of persistant data, security management, caching/performance, XML generation, HTML generation, generation of format XYZ, HTTP get request processing, customization management, mail interface, code to handle specific content areas, and so on.
Taking the same types as before, study how those types are displayed to the user.
Nuf said.
Best, beth
|
---|