|Just another Perl shrine|
Re: Parse PHP or Perl and Reconstruct MySQL Schemaby roboticus (Chancellor)
|on Jun 04, 2006 at 12:45 UTC||Need Help??|
I apologize in advance for the length of my response, but I've had to do this sort of task a couple of times...
Depending on the coding style of the original programmer, the task could range from straightforward to absolute chaos. If the programmer uses a consistent coding style, e.g., using $sql for SQL text, explicitly naming columns to be retrieved, etc., then it can be a pretty easy job. However, you'll still need to figure out the proper indices for the tables, as hesco mentions, and as samtregar mentions, type information could be difficult to infer.
On the absolute chaos side of things, the coder(s) could use inconsistent styles, making the SQL difficult to locate. They could just use SELECT * FROM TableName, as described by CountZero and UnderMine. Even worse, the programmer(s) could put all sorts of database manipulation in poorly-named stored procedures, and rely on triggers to do things behind your back.
So, before estimating the time it would take to reproduce the database structure, be sure to do a quick review of the code to see how it looks. After doing the review, if you still decide to bid on the job (and win it), here are a couple of tips that may help.
Now the grunt work...Now you've got an iterative process before you. Create your database with the DDL script, populate the data with your DML script, and run your application.
For each run, try to exercise as many code paths as possible, and see what happens. In some runs, be "kind" to the application (i.e., give it "reasonable" data values) to help you get further into the application. On other runs, be "mean", giving the app oddball values and try to flush out any bugs, inconsistencies, etc., to help shed light on the applications interaction with the database.
Document each "interesting" test case that exercises some feature of the database and application that you don't already have a test case for. Also, try tests that may help expose "hidden" functionality in the database (those aforementioned stored procedures and triggers.) For example, after inserting a new record, check all the reports offered by the application to verify that the appropriate data shows in all locations. If not, it may give you a clue as to something new you'll have to add. (Here, interviewing the app's owner can be helpful. For example, they may tell you that the data normally doesn't show up until the next day, indicating a database maintenance task that runs once a day, or they could indicate that it should show up immediately, indicating that a trigger might be needed.)
Each time you learn something new about the database, update your initial diagrams with notes, e.g., "the contents of table.column always seem to be 'Y' or 'N'", and "Whenever table2.columnFoo holds 'M', there should be a record in table3 where the key is table2.columnBar+table2.columnBaz. Then update your DDL to add the new knowledge. As required/desired, add to your DML script to help exercise code paths, generate 'what if' cases to test, etc.
Also, when the application runs correctly, dump your database tables as DML so you can add it to your original DML script to load more data into the database on future runs. (Many features may not be testable without a reasonable quantity of test data.)
Constrain your dataIn a relatively short period of time, you should be able to have a pretty good picture of the database. Then you can start looking over the relative frequency of various queries. Analyze this usage pattern to determine what indices to create on which tables. Analyze your table diagrams and see what foreign keys to add to your tables, what constraints make sense, and start tying down your database. After all, once you let crap data get into your database, your application will have to work much harder to produce reasonable results. Whenever possible, use constraints, foreign keys, default values, etc.
Database MaintenanceFinally, once you've done all that, start thinking about how the database will grow, and what sorts of maintenance it will require. Are there tables that will need to be cleaned out? Will some data expire? If database maintenance code already exists, be sure to test it out. If not, write what you think you'll need--and be *certain* to test it out. (You wouldn't believe the amount of database maintenance code that fails when it finally triggers because the authors either don't test it, or don't test it after schema changes.)
Save your workFinally, put *all* your work in a binder and tuck it away for the next time you need it. (Even if it's not particularly neat, put it in the binder. If you never need it, it won't matter that it's not neat. But if you do need it, you won't mind that it's a bit of a mess. And believe me, you'll need it. About six months from now, a case will come up that you never tested (no matter how hard you try!), and you'll have to fix it. That's when the binder will be useful! Be sure to add the funky breakage to your test cases, and add any required data to your DML script, so when you make changes in the future, you can exercise that new test case. A lot of work? Yeah, it can be. But it's less work than just hacking at it. Especially the second time, when it's time to fix it later or add a new feature. That documentation will nearly always pay for itself.
This is what works for me, YMMV.
Why, yes, I *do* deal with many large databases and crappily-written applications with no documentation. How ever did you guess?