stonecolddevin has asked for the wisdom of the Perl Monks concerning the following question:

Howdy monks.

I've run into a bit of a quandary. I've decided to write a simple guestbook for my website, and I want to use a flat file text database so I don't have to use up more space on jcwren's server than I need to, and, I haven't done flat file work in ages, so I thought it would be good excercise.

Now, I can write to, read from, even possibly replace and delete some text (a stretch...but hypothetically possible by me, but I'm notoriously terrible at regexen) from a text file (delimited by some convenient character.) However, I have no clue how to construct any sort of regex to match terms from a search against a text file.

So I've thought, "Well I could cheat, and use something like DBD::CSV, or I could actually learn how to do it." My question is, should I do it myself in this case, or should I just cheat (because I obviously SHOULD learn to do it myself, at some point in time at least)?

If one looks at the pro's/cons, some pros of doing it myself would be:

  1. I know how to, and wouldn't have to post about it in the future,and
  2. 2. I could possibly understand the modules that already do it out there better.
Some cons of doing it myself might be:
  1. I could make a mistake,
  2. re-invent the wheel, and
  3. waste time.

Your feedback is greatly appreciated monks.

meh.
  • Comment on To learn to search flat files or to cheat...

Replies are listed 'Best First'.
Re: To learn to search flat files or to cheat...
by bart (Canon) on Nov 09, 2006 at 18:33 UTC
    In experiments I've noticed that SQLite files aren't much bigger than their equivalent in tab delimited textfiles. So I wouldn't use flatfiles just to save space. If you want to use SQL, use DBD::SQLite.

    Here's a little node I wrote some time ago, about my experience with SQLite.

Re: To learn to search flat files or to cheat...
by GrandFather (Saint) on Nov 09, 2006 at 18:52 UTC

    There are times to invent, or even re-invent wheels, but there are also times to learn how to fit an existing wheel to your wagon. There are a lot of existing wheels that fit this particular wagon and they are worth learning about almost as much as regexen.

    A deciding criteria however may be that, while a ton of questions get asked here about how to apply regexen in various ways, many fewer questions arrise about using various DB techniques. If it is learning you want to do, browse the regexen questions here (and try to answer them without looking at the other answers in the first instance) to learn regexen. But open the DB can of worms to solve this particular problem. There are a number of data base modules that may be of interest, DBD::CSV obviously, but also DBD::SQLite and DBM::Deep are worth looking at for other ways to do it.

    Both approaches involve learning stuff that will be bound to come in handy in the future, but regexen you should have plenty of opportunities to learn about. Finding excuses to learn DB techniques are a little rarer.


    DWIM is Perl's answer to Gödel

      That's an interesting take on this GrandFather. I really appreciate that insight on this. I think, in essense, a DBD module is going to win out over any hand coded flat file indexing/searching etc., because you're still handling the flat file when using the DBD module, you're just doing it in a better and most likely more efficient way.

      ++Kudos to you GrandFather :-)

      meh.
Re: To learn to search flat files or to cheat...
by perrin (Chancellor) on Nov 09, 2006 at 18:59 UTC
    In general, flat-files (which I take to mean CSV, and not XML or similar) are not the answer for data that you want to update and query. They are more useful for data exchange. When database systems were difficult to come by, they were more common as a backend. These days, there's not much reason to use them for storage on a new project. I suggest you use SQLite, or at least abstract them with BDB::CSV.

      I actually use CSV for data store that I want to update and query. It offers me a fast way to manually update the file at the same time. Writing a program to run UPDATE TABLE WITH VALUE = 'foo' WHERE KEY = 'bar' when I can just go in with vi and tweak it ... just seems like a win to me ;-)

      (Not that I recommend this for all, or even many, uses ... especially live production ones, but you did say "there's not much reason to use them" - I think this one can be significant if it applies.)

      Update: perrin is right - I don't do this for CGI scripts, although I do use this in some statically-generated code whose data store isn't updated via code at all. I may move this to be dynamically generated at some point in the future, but the data store will likely remain read-only as far as the web app is concerned.

        You would edit a file with vi that can be modified by a live CGI? That's a recipe for lost data. You have to follow the same rules that the CGI script does for updating it if you want to be safe, i.e. set and respect locks.

      Thanks for the advice perrin. I'd say I'm still getting my wings with deciding what's best for data storage/query/etc. so this kind of advice helps put me in the right direction.

      meh.
Re: To learn to search flat files or to cheat...
by fenLisesi (Priest) on Nov 09, 2006 at 18:37 UTC
    I think the main issue in using your own text file that is likely to be accessed in read/write mode by multiple apache processes is flock. DBD::CSV seems to handle that part. The answer to your question lies in the urgency of your project, I think. If you don't have a hard deadline on it, why not do it both ways, learn, compare and tell us? Cheers :)

      Well it's my own website, and I'm quite lazy, so naturally i'm not going to set a deadline for myself :-).

      My next question would be: where can I learn to search a flat text file (flock issues aside) using regexen or some such method? (modules such as File::Data have a search method, but even then you have to provide a regexp to match against)

      meh.
Re: To learn to search flat files or to cheat...
by davido (Cardinal) on Nov 10, 2006 at 06:52 UTC

    If you're working with a flat-file, using it to represent records, you've got some issues to deal with. First, if the individual records aren't fixed-length, then you need to maintain an index, or suffer the performance penalty of simply having to skim the entire file just to find one record. Additionally, any change you make (again, assuming the records are not of fixed length) will mean you have to rewrite the file, or mark an existing record null and void, and append its replacement at the end of the flat file (which further hampers searchability). In other words, a non-fixed-length flat file based records approach is kinda messy and speed inefficient.

    On the other hand, flat files with fixed length records would allow for rewriting individual records without rewriting the entire file. Still, an index file would be a good idea, unless you maintain the flat file in some sort of sorted order so that you could perform binary searches on it when you need to find something.

    All this is usually too much work. DBD::SQLite is really a convenient alternative, and nearly as compact as a flat-file approach, but without all the complexity. Honestly, you don't need to do this yourself, unless you're sitting in a comp sci class.


    Dave

      I agree, SQLite is everything I need right now, even moreso that MySQL in that it's much more convenient, compact, and takes next to no memory. I'm not up to commercial apps yes, so I don't need something that's going to eat at my memory like MySQL does sometimes.

      I am having trouble with read only errors after I uploaded my database to my server, should I just use SSH and create a new database with the same columns and such? That way I don't think the readonly errors will pop up when I go to write to it...

      meh.

        You could just create a new one there. What do you suppose is the problem? Did you transfer the file in ASCII mode when it should have been in binary mode, or vice versa?

        I've never tinkered with the portability across OS's of the file that SQLite uses to store its data. But I think SQLite does support block inserts, so you could write a script that just exports all the data into an ASCII file, and then another one to import it into SQLite at the other end of the line.


        Dave

Re: To learn to search flat files or to cheat...
by planetscape (Chancellor) on Nov 10, 2006 at 00:40 UTC

    I read "flat file text database" and just wanted to point out another alternative: DBD::AnyData

    I have been messing around with AnyData and DBD::AnyData while working with some XML and CSV files, and the modules seem full-featured and easy to use (although personally I still need to work on my XML::Twig-fu to get the most bang for my buck).

    HTH,

    planetscape