kiat has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm intending to set up a database (assuming the server doesn't have DBI and mysql) to store students' profiles, grades for various online tests, information of login etc.

Since I probably won't be using DBI and mysql, I'm left with the option of storing the records in a flat textfile.

I've thought of two ways to go about storing the data in a textfile but have no idea which one is better. The following are the two ways I've in mind :

1) Use a single textfile
The entries in the textfile would look something like this (just a quick example):

pupil1==
maths:1_80:2_90
english:2_75:3_75
lastlogin:May 21
pupil2==
maths:1_68:2_87
english:2_75:3_75
lastlogin:May 12

(1_80 means Test 1 score 80)

2) Use multiple textfiles named as the name of the students
If a student is named "Peter", then the textfile would be name "Peter.txt". Of course, this runs into problems when there is more than one student with the same name but I reckon I can do something to the name to make it unique or perhaps use some kind of number identification. The entries in such a single student file would roughly be similar to those above except that they are only for one student.

Regardless of the methods used, I'll writing a script to parse the file and display the person's data in a web format so that the student can view his records dynamically.

I've to take into consideration the situation in which more than one student is accessing the tests or doing stuff that requires updating his or her records in the textfile.

My question thus is: Which of the above two methods is more efficient? Any comments on the way I approach the task would be very much appreciated.

Thanks in advance :-)

kiat

Replies are listed 'Best First'.
Re: Which is the better option?
by dree (Monsignor) on May 18, 2002 at 13:11 UTC
    Both CSV and XML are good options. For good disk access you can use DB_File: you can tie an hash on disk. You can use the student's name as key to fast access the record:
    $student_hash{$name}=...CSV structure here...
    Also you can manage duplicate keys. So the code to tie is:
    use DB_File; tie %DB_IT, "DB_File", "$filename1", O_CREAT|O_RDWR, 0666 or die "Can' +t open $filename1: $!\n";
    or
    use DB_File; $DB_BTREE->{'flags'}= R_DUP; tie %D1, "DB_File", "filename2", O_CREAT|O_RDWR, 0666, $DB_BTREE or di +e "Can't open $filename2: $!\n";
    for duplicate keys.

    Update: changed CVS in CSV :)

Re: Which is the better option?
by TheHobbit (Pilgrim) on May 18, 2002 at 12:21 UTC

    Hi,
    Well, just to had to your confusion, I'll suggest using a whole different approach. Simply (well, simply... sort of) imitate what a true DBMS will do. Use csv files to hold tables, and use some Tie stuff to tie the real files to memory helde structures.

    The hard part is that you whant multiple updating access (by the way, IIRC mysql does not allow this). You should be hable to do that by using some lock mechanism in your Tie part.

    I'm almost sure someone already implemented this, but if you can not have DBI installed, then probably even get other modules installed will be a PITA.

    One las thing, if your server has (or would install) DBI but lacks the mysql (or other DBMS) server, then you could consider using something like DBD::File or DBD::CSV as your database dependent driver under DBI.

    Cheers


    Leo TheHobbit
Re: Which is the better option?
by ceedee (Sexton) on May 18, 2002 at 12:55 UTC

    I have a preference for a method along the lines of the first layout you have mentioned.
    However, have you considered using a xml structure and a perl module to manipulate it.
    I am very fond of XML::Twig (requires XML::Parser) it has provided a very simple and enjoyable means of organising data and allows for simplifying the way in which I organise and handle data structures. Its kind of interesting and I have found it useful for a wide variety of tasks (especially in terms of organising site content).

    You could see a relationship between the first file structure above and a xml file structure, if you stretch your imagination a little.

    The flat line file pattern already suggests that students are containers for exams (I don't know whether the students would like that idea) so either way xml allows you to use differing attributes to identify unique objects sharing the same element type, instead of incrementing an identifier such as
    pupil1==
    Its a bit like saying:

    #pupil=A Student|&exam=Maths 1&result=21#pupil=Foo Student|&exam=Scien +ce&result=35|&exam=knitting&result=75#pupil=Joe Bloggs|&exam=Head Sta +nd&result=100

    I hope that doesn't fall flat...:)

Re: Which is the better option?
by mrbbking (Hermit) on May 18, 2002 at 13:13 UTC
    If you decide to go with a single flat file along the lines of the example you used, you might consider Config::IniFiles or Config::Simple. Both insulate you from the drudgery of parsing that sort of file by hand.

    Your file format would have to change a little, but it would be quite similar:

    [pupil1] maths=1_80:2_90 english=2_75:3_75 lastlogin=May 21 [pupil2] maths=1_68:2_87 english=2_75:3_75 lastlogin=May 12
    Then, something like this:
    use Config::IniFiles; my $cfg = new Config::IniFiles( -file => "/path/configfile.ini" ); my $pupil1_maths = $cfg->val( 'pupil1', 'maths' ); my @pupil1_maths = split /:/, $pupil1_maths;
    might get you what you need.

    But Config::IniFiles only reads the values - it does not offer a way to write/update them. I see that Config::Simple does have writing methods, so - if you go with this file format, you may want to take some time to look into that one.

    I hope that helps you...

Re: Which is the better option?
by lachoy (Parson) on May 18, 2002 at 14:17 UTC

    Similar to what TheHobbit said: if you're able to use or install DBI you might want to look at DBD::SQLite. It's a database driver and database library all rolled into one. It's very fast and you can use normal SQL statements just as if you were using MySQL or other RDBMS.

    You might also look into using some sort of DBM storage and MLDBM. Just use your existing IDs as keys (e.g., 'pupil1', 'pupil2') and store the information in a hash.

    Chris
    M-x auto-bs-mode

Re: Which is the better option?
by dev2000 (Sexton) on May 18, 2002 at 16:30 UTC
    In a flat-file world, I think it depends what you plan to do with the data.

    If you want to show all of the grades at the same time I would use a single text file and read each student to build the results. You can easily do comparisons, averages, search for grade ranges, etc. this way. You can also easily manually type up and edit this single text file.

    If you want each student to access only his own grades I would use the multiple text files approach. Each student can also easily update information in his file this way, without reading and writing everything. Each file will have to have a unique name, but even if you implemented a retrieve by name function on a single text file you would have an issue if two student shared the exact same name. You should write a script to generate these multiple files from a master file, or even better, some sort or management script to oversee the creation and updating of the contents. This may be helpful:
    sub readnames { opendir THISDIR, '../students'; @allnames = readdir THISDIR; closedir THISDIR; }
    This passes all of the (file) names in the "students" directory to @allnames where you can use it to update or delete the individual student files.

    Either way, don't forget to lock your files with FLOCK.
      Hi dev2000,

      Am I right to say that, provided I set up the flat-file correctly, I should in theory be able to use the flat-file just as I would a DB_File database file? The differences between the two methods would be in terms of speed and efficiency.

      But in terms of doing what I intend to do with the flat-file (modifying the entries, deleting, appending or just merely reading), I should have no problems using a flat-file. Am I right? What if the flat-file gets very large? Is a large flat-file inherently more prone to error when entries are appended or modified and written to the file?

      If flock is used, does it mean that when the file is opened for writing, another process that requires the file for writing (but not reading) will not execute and hence the first request is guaranteed to execute successfully?

      I look forward to hearing from you :)

      kiat
      Hi Kiat,

      I think for small applications (few users, small data set) flat files will be an easier, faster and more efficient choice.

      It's when you have a lot of people connected to a large amount of data that a database like MySQL will scale well and shine. mod_perl even allows for persistent, reusable database connections which further increase performance. I'm doing a lot of research into this area now and converting a flat file system to a MySQL database. My goal is to try to handle thousands of people at the same time (with a low amount of data, though).

      A flat file (large or small) is prone to corruption when two people try to write to it at the exact same time. This is where flock comes in. Simultaneous read access of the same file is not a problem, but consider this scenario:

      I open a file. I read a number from the file. I increment the number. I close the file. I open the file for writing with a lock. I write my number. I close the file.

      You read the same number at nearly the same time. You increment the number. You go to open the file to write the new number... it's in use by me. You wait. You then write your number.

      In the end, the number was only incremented once. You can see how this example would fail as a counter.

      I think the only way around this is to open and lock the file for reading and writing simultaneously, which can be tricky...

      In a database this is handled automatically.

      In my understanding, flock works on a system level and acts as a traffic signal for files. For example, if I'm currently writing to a file and you want to update it too, you must wait until I'm finished. Your process gets stalled until I release the lock, then you go. Normally we are talking about milliseconds, so performance is not a problem.

      Closing a file automatically unlocks it, so you dont have to. Also, flock is UNIX based, and I don't think Windows supports it.

      Good luck - I hope this helps. Any other questions, just ask!

      :DEV2000