in reply to Moving from SQL to flat-files

I need to wholeheartedly agree with perrin. Rather than go the route of ripping out the basis upon which scalability is established (your RDBMS), why not merely replace it with one that fits your needs better?

At least maintaining your SQL so that you can scale again later is a huge start - jZed's suggestion to use DBD::File or other similar modules that allow you to use plain SQL to access data in otherwise flat files would tide you over allowing you to get back up and running with a minimum of fuss while allowing you to continue investigating a more robust solution (no offense intended, jZed) should the need actually be there.

Popular alternatives are mySQL and PostgreSQL, both in the free realm. But I've also heard that MS SQL, Oracle, and DB2 all have free-for-production-use server products available as well, with all the consummerate feature enhancements and full scalability of being able to upgrade (for a price) to their larger-scale offerings should it be needed (including high availability failovers, clustering, etc.).

Rather than throwing out the baby with the bathwater, I urge you to reconsider. Using flat files as an intermediary makes sense (especially if it's CSV format which is often importable directly into most other RDBMS formats). But using it for production code is somewhat scary. Personally, I think I'm pretty smart. But I definitely have not taken the time to understand everything that a database offers me. I'm just happy to take advantage of it. In fact, in the production of a website built to track test plan progress, my manager not only told me not to do it, but not to use a database in doing it. (I don't quite remember how the latter point came up given the former.) So I started with flat files. But then I started worrying about concurrency and ... well, then I just went ahead and used a database (DB2, actually). And I didn't have any problems with multiple users logging their tests and overwriting each other. Without spending any time on it (other than learning DB2).

Replies are listed 'Best First'.
Re^2: Moving from SQL to flat-files
by punkish (Priest) on May 10, 2006 at 12:58 UTC
    Popular alternatives are mySQL and PostgreSQL, both in the free realm. </blockquot>

    Choice of rdbms was not the issue... I already had a pretty fine engine in SQLite, and it was doing my job well. The immediate issue was portability. There is a certain level of complexity required with maintaining a database. In the case of SQLite, I had to usually upgrade my <cpan://DBD::SQLite> whenever that was upgraded, because it gave to me whatever security, quality, or performance enhancements the underlying SQLite brought. Then one day, bam!, the SQLite format became binary incompatible with earlier versions. I needed the new version because VACUUM was giving me an error, and that was fixed in the new version. But, my database suddenly became "different." I fixed it all on my computer, but my hands were tied as far as my web host was concerned. They wouldn't upgrade the driver until it was deemed "stable" for them (understandable, but it boofed me).

    The other issue often cited is scalability and concurrent use. I am not sure how much of this is just accepted wisdom (folklore?) but somehow it seems folks accept that flat files don't scale well. Concurrent use is definitely an issue, but not for me right now... I am designing a primarily single user application. Even when it becomes multiple user, I am not talking about an Amazon.com here. Can a flat files driven app handle a few thousand hits a day? If yes, I am set. Of course, the "database war" stories on radar.oreilly.com (link in the OP) mention websites with much higher use and content. So, perhaps scalability is myth.

    Then there is the issue of accessibility (defined my way). With SQL, all the data are hidden in the SQL engine. They are only accessible through an application, be it my web app, or some other interface. With flat files, I can open them with just a text editor. Push comes to shove, I can ssh into my web host, fire up vim, and edit a file, and the changes will show up in my app.

    The last issue is of a "challenge." I have been quite taken by the "simplest thing that could work" bit (did Ward Cunningham postulate that?). So, I've set out to see if I can redo the app with flat files. Yes, redoing the SQL into non-SQL logic is quite some work, but only becacuse I had trained myself to think the SQL way.

    SQL engines are very important, and there is a lot dogwork hidden inside of them. But, are there times when they are simply unnecessary? I have actually learned all sorts of neat programming techniques since I started battling converting my app to flat files.

    Hope this makes sense. But I am also learning from all the different viewpoints I am hearing here.

    --

    when small people start casting long shadows, it is time to go to bed

      So you had a bad upgrade experience with SQLite and you're solving it by writing your own database? Sounds like you're throwing the baby with the bathwater to me.

      Can a flat files driven app handle a few thousand hits a day?

      Safely? Without corrupting the data or losing updates? Yes, but only if you put lots of work into it. Locking is not easy. You can get something fairly safe if you carefully use exclusive locks, but it won't scale very far before people end up getting blocked when trying to update. Maybe enough for you, maybe not.

      Of course, the "database war" stories on radar.oreilly.com (link in the OP) mention websites with much higher use and content. So, perhaps scalability is myth.

      No! Don't draw the wrong conclusion from that series. For one thing, those people are usually talking about something totally different from what you're doing when they say "flat-files." Google doesn't do anything remotely similar to flat-files. They wrote a filesystem! They wrote a distributed, indexed storage system with locking and logging and most everything else RDBMSes have. One guy was using BerkeleyDB and calling it flat-files. Some of them are just storing data that is compiled off-line and never updated by the application, so they have no concurrency issues. Some of them are talking about data that they just read all of and never query against. You can't pull any general ideas about the scalability of flat-files as an RDBMS replacement from that O'Reilly series.

      What I kind of meant was that these other databases may provide better migration stories. Migration from one version to the next, or migration from one platform to another (although I suspect MS SQL may not be so good at that ;->).

      As far as SQLite fixing a problem that you couldn't deploy on your server - which is worse: not being able to deploy a known fix, or not being versed enough in ACID transaction compliance and implementation to provide a secure method to ensure no loss of data in your own private modules? Personally, I'd rather stick with SQLite and force my provider to upgrade (or move to a new provider) than try to fight with unknowns which I won't discover until I've already lost data.

      Part of the scalability of "real" database servers is that you can scale your webserver and your database server independantly. At first, they live on the same box. Then you get too many hits, so you move your database to a second box. If you're using flat files, you can't quite do this since you can't offload your data retrieval/saving to a second box - you need to use that second box as a second webserver, but use NFS for sharing the data. Now things get a bit messy - locking across NFS isn't nearly as safe. As things move on, you get more and more webservers just to handle the traffic. At some point, you may have so much going on that you actually need a second *database* server. At this point, you'd probably need to nearly double your webservers for flat files. But, because you've offloaded so much of that work to a machine with lots of RAM (for buffering), and fast disks, you only need to duplicate the single machine. And now you need something with clustering capabilities to manage the data.

      I realise that SQLite isn't a client/server database. But, because you've already built all your logic with SQL, it does provide a great starting point to move to a client/server database system should the need actually arise. Like the rest of the above scalability, you may never need it. But, should you actually need it, you'll be kicking yourself if it's not built in from the ground up.

      Flat files bring with them so many of the concurrency and data integrity issues that databases have solved that I would simply never want to store data in something that hadn't solved those issues for me.