dev2000 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm trying to overhaul my perl flat-file database into a speedy mod_perl / MySQL solution... I'm wondering about how to deal with my arrays, though. Currently I'm using:
$timeold = <FILE>; @datas = <FILE>;
After reading the time, this neatly puts the remaining contents of the file into @datas which I can happily pass along to the user. New entries are just appended to the file with a new line character at the end:
print FILE "$new_data\n";
I do periodically remove the oldest lines (near the top) in an effort to keep the file from growing too large. So how could this type of array-like functionality be accomplished in my new design? Possibly with a SET? But this seems to have a maximum of 64 elements and there is no such limit the old way. I also thought of using one huge BLOB to store the data and then splitting it up after retrieval, but that would make for painful maintenance, I suppose. Or maybe I'm just overlooking something basic in database design (I'm new to this!).

Also, is this overhaul worth the effort? Will I see the speed increase I hope for?

Thanks for any suggestions!

Replies are listed 'Best First'.
Re: arrays in database
by dsheroh (Monsignor) on May 18, 2002 at 16:24 UTC
    If I correctly understand your question, you would want to create a table in the database for each of your existing text files. To suck the entire contents out of a table, you would run a query against it (most likely SELECT * FROM tablename;, if you really do want everything) and use the query object's fetchrow_array method in list context.

    However, if you do that, the performance improvement isn't likely to be much on reads or appends, although deleting data from the start of the list would probably improve considerably. Sucking an entire file into memory tends to be faster than sucking an entire database table into memory for the simple reason that it's going through a less complex access mechanism. The major advantages of databases are in seeking data (so you don't have to bring everything into memory) and concurrency, and slurping the whole table doesn't take advantage of either of them.

    So, what do you actually do with the data after reading it? Do you really need to have everything in memory or can you use SQL to specify a subset of the data which is actually interesting to you?

      I think the table itself would be a listing of hundreds or thousands of users...
      speeding up multi-player game with shared memory or MySQL
      ...with one column for the name (primary key), another for time and status data, and finally the array, each line of which represents game data communication between players.

      That's the way it is now as many flat files. :)
        So you're asking about how to store something along the line of perlmonks' chatterbox messages? In that case, I wouldn't even bother with writing something that transient to disk. The amount of content would likely be small enough to use shared memory for it, or, if that gets to be too much of a problem, you could use an in-memory filesystem (such as linux's tmpfs; I don't know if freebsd has an equivalent) with either flat files or a temporary database that gets created on server startup. If you go with flat files, remember mbox and maildir: You'll get better performance (especially on deletes) by chunking the messages into multiple files (say 1/message or 1/minute of messages) instead of using one big file for everything.
Re: arrays in database
by Angel (Friar) on May 18, 2002 at 16:03 UTC
    What about a second table with a key that references the first table. I don't really have enougth information to be of exact assistance but if I had a user and a list associated with that user of arbitrary length I would put a second table in with a primary key unique to that table and a key in both tables that matched the user somehow either the UserID or another ID so that the first table referenced the first and all that needed to be done is push the results from a fetchrow_array into an array with a loop. Then delete from the second table all records having that key when you need to get rid of them. If you want to do that after a certain time MySQL has a time function or store the epcoh time from perl as a field and then use that as part of the SQL query. Good Luck
Re: arrays in database
by perrin (Chancellor) on May 18, 2002 at 16:46 UTC
    Arrays are typically represented in relational databases using a table like this:
    SEQUENCE INT VALUE TEXT
    Then you can use the SEQUENCE column in an ORDER BY clause.
Re: arrays in database
by blackjudas (Pilgrim) on May 18, 2002 at 19:38 UTC
    Only to add my 2 cents worth. If you are looking at storing perl data structures with unknown lengths. Modules such as FreezeThaw, Data::Dumper and Storable will convert a perl data structure such as an array or a hash or any mixture thereof in a format that you can import back into your program from the database. These modules turn a structure into a simple text format which can be easily stored and retrieved from any RDBMS.

    BlackJudas
      Actually, perl structures with unknown lengths are not best handled by the FreezeThaw and Data::Dumper modules. Perl structures of unknown complexity are handled very well by them. Lets say I have a structure that is a string of numbers. Using freeze or nfreeze to put it in the database means that there is an operation to expand or put it away. We're starting to drift off into database design, but it's a good general principle to think about. If i need to flick something in the middle of the array the operation is:
      1. Get the record
      2. Thaw it
      3. Change the value
      4. Freeze it
      5. Put it away
      If you had a series of updates to do on different arrays, it doesn't make much sense to freeze it. Another reason is that it doesn't scale very well. There is a finite amount of data that it can hold (albeit quite large).

      A simple way of thinking about a flat array in a database would be (and this is how nodegroups are handled in ecore, the system that runs this site):
      • INCLUSION_ID - A unique primary key
      • ARRAY_ID - The descriptor of the array
      • ARRAY_POSITION - Something to order by
      • STUFF - The thing you need to hold
      SELECT * FROM INCLUSIONS WHERE ARRAY_ID='42' ORDER BY ARRAY_POSITION
      This scales as big as you need it, plus allows you to do mundane operations such as changing order of items, etc etc, without having to pull the entire glob out of the db and shove it back in. However, whether the SELECT or the thaw is faster depends on that dataset. YMMV.

          --jb