in reply to DBI vs MLDBM/GDBM_File, etc.

Hey, I have taken two database courses, and I had a very good introduction to the concept of relational databases (IMHO).

I wanted to just remind all the people here how powerful a flat text file is for simple data storage. You are right, Jonathan for having asked those questions. Everything depends on what type of data ZZamboni wants to store, and how he needs to process it.

I will add my two cents to the thread by saying that for simple storage, there is nothing better than flat files. The flat files are growing up? So what??? Why aren't you updating the system so that it would use more than one unique file. Imagine you have 10,000 items to store. Well, why not having 10 files to store them? The first file for the first 1000 items, and so forth. Speedwise, I am telling you, you will end up with something a LOT faster than any other big DB package or wrapper like all the DBI stuff. Because those packages are, in fact, also using some big files I guess...
So... how complex is your data? How do you want to access it? Do you need any relational concepts? If the only concern you have is the speed, then my advice would be Keep the flat file system - Just improve it a little!.

Replies are listed 'Best First'.
The (lack of) POWER of flat files
by BBQ (Curate) on Jul 04, 2000 at 21:43 UTC
    Gaggio,

    We all know that there is a right tool for the right job, and flat files will be of great help for most storing small amounts of data. But seriously, after having worked for 2 years ONLY with flatfiles, I would never, EVER, recommend to anyone doing the approach just mentioned above. There is no telling how much "Just improving a little" your data will require in the future, and that could bring you down to your knees with maintanance in a short while.

    BTW: there is NO WAY that you will get faster access from a flatfile (or series of them) than from a real DBMS. Expensive DBMS's like Oracle, Informix, DB2, et al will even keep most accessed data in RAM to avoid going to disk for frequent queries. Sure, you can do that with perl too, but you're not talking about 'just' flatfiles anymore.

    Please reconsider your advice, there are tools and tools for each job, and I don't think flatfiles will keep you running as smoothly as you think.

    #!/home/bbq/bin/perl
    # Trust no1!
      You are right, BBQ, when you say that there are tools and tools for each job. Don't make me say what I did not say. I said that flat files are fast depending on the use you make of them.

      I should also have added that I am still a student, and I am not the administrator of University website. In my case, I think that flat files are the solution, compared to huge-pain-in-the-ass-to-install DMBS systems. I never said that MySQL was not fast. This is right, caching make the overall performance acceptable.

      But again, I am saying that the easiest solution for ZZamboni might be to keep the flat file format. *Might*, because he did not say everything about what kind of data it is, and what use he wants to make out of it.

      Father Gaggio
        I am sorry if I misunderstood what you wrote. It is not my place, or intention to try and distort what you would most sincerely recommend to a fellow monk! Just because I don't agree with you, doesn't make me better, or more correct. I just disagree with your views on flatfiles, that is all... What you did say and I disagree with is:

        > 10,000 items to store. Well, why not having 10 files to store them?
        > The first file for the first 1000 items, and so forth. Speedwise, I
        > am telling you, you will end up with something a LOT faster than any
        > other big DB package or wrapper like all the DBI stuff. Because those
        > packages are, in fact, also using some big files I guess...

        My general DB and flatfile experience tells me that if you exceed a file with 2500 records, about 400 chars wide, and having more than one query per second, you are better off with a real DBMS. Yes, ZZamboni's easiest way out is probably going trhough flatfiles, but even in that case I would try doing something DBIsh. And while we are on that topic, splitting a large file in several smaller ones will not help at all (actually only make matters worse) if you don't have some sort of clever indexing system. By splitting the data in different files, you will not increase lookup speed, and will have a penalty for having to open each one of those files to do a full search! Again, I would split the files only, and only!, if you have a good indexing mechanism and can't afford (money, or machine wise) a DBMS. Most of the DMBSes already have clever indexing systems, so you don't have to reinvent the wheel.

        On a side note, caching won't make the perfomance merely acceptable, it will make it go through the roof!! There's no way of comparing disc access vs. ram access.

        #!/home/bbq/bin/perl
        # Trust no1!
RE: The POWER of flat files
by lhoward (Vicar) on Jul 05, 2000 at 16:31 UTC
    Since everyone is coming down on flat-files, I figure its about time for me to come to their defense.

    Relational Databasese are great for most types of data storage. Particilarly when the data will be accessed randomly, when you need something that will maintain atomic transactions and referential integrity for you, and having the physucal data-structure abstracted for you. However, there are several occasions when you can't beat the speed and convenience of flat-files:

    1. data files that only need to be appended to and are read infrequently or sequentially. It is almost always faster to append a line to a flat file than adding a row to a DB table; there is no overhead of creating index entries, etc....
    2. data files that only need to be accessed sequentially and completely, never randomly.
    Also if your data is strongly heirarchical you could be better off using a heirarchical datastore, such as your computer's filesystem.
RE: The POWER of flat files
by buzzcutbuddha (Chaplain) on Jul 05, 2000 at 15:57 UTC
    Actually, flat files are not faster for large amounts of records...beyond that, as the data is pulled into your program you have to take extra measures to tie it all
    together, whereas a RDBMS will already have all of the information related for you.

    Not to mention the fact that a lot of things that you would have to do programatically with flat files you can do with the DB and it's built in methods.
    • Sort the list? Use ORDER BY
    • Get only unique values? Use DISTINCT
    • Need to tie two tables together? Use JOIN
    • Need transaction support? Use PostGre or SQLServer or Oracle and you have it.
    • etc...
    There are many benefits that come from using a RDBMS, and if ZZamboni wants his site to grow easier, then he should port to one. The investment is worth the
    payoff later, and he gets to learn a whole new side of Perl he has not done yet.