Trihedralguy has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a project that involves storing a somewhat "large" amount of data. I'm trying to figure out the best way to go about it.

Basically I will need to store a list like this:

1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8...
---------------------------------> Roughly 1000 Digits, Comma Separated

These will be data points that I will reference to display something on a website, user generate values.

My question is, should I do one column with all 1000 numbers and commas, using Perl to split at the commas or use 1000 columns to store the data. (A lot of columns, doesn't sound like a good idea!)

However, every time the data is stored, which will be quite often, it will need to split or rejoin the data back together before inserting.

Is there a better way to go about thinking about this, is there a way to use an array or something like that for my data in mySQL.

Ultimately I want to program something in perl, using DBI and mySQL to store this data. It will be an interactive system for multiple users to be updating the data at once.

With that said, I'm trying to come up with a solution of either perl handling the mess, or giving the job to mySQL to handle the mess with what seems to be like a ridiculous amount of columns.

What do you guys think?

Replies are listed 'Best First'.
Re: MySQL - Query vs Perl
by klassa (Acolyte) on Aug 31, 2008 at 01:14 UTC

    I think it depends on your needs. Perl's pretty good at splitting and joining, so I don't think you're going to run into a problem if you stick the whole chunk of data (as a string) into a single column.

    I certainly wouldn't have 1,000 columns in my database table, no...

    If you really want to store the values separately, then I'd turn the problem sideways and store each into a separate row, rather than a separate column.

    Example: Let's say you've got a user with user ID 332. In the first version (one column), you'd have:

    KEY VALUE 332 1,4,2,4,1,6,4,6,5,1,0,...

    In the multi-row version, you'd have:

    KEY VALUE 332 1 332 4 332 2 . . .

    If you need them in a particular order, you could add a column to order against:

    KEY FETCH_ORDER VALUE 332 1 1 332 2 4 332 3 2 . . .

    and then retrieve them with:

    SELECT VALUE FROM MYTABLE WHERE KEY = ? ORDER BY FETCH_ORDER ASC
Re: MySQL - Query vs Perl
by karavelov (Monk) on Aug 31, 2008 at 01:53 UTC

    If you do not need to filter the table in any of this fields (SQL "where" and "having" clauses), aggregate or join other tables using some values of this field you could safely use the string split/join solution.

    If you need to manipulate individual values in SQL you could go in 2 ways:

    • the suggested row based representation
    • Look for arrays. I am not sure that MySQL have array fields but most of the DBMS I have experience with have some sort of array support (Oracle, Postgresql)
Re: MySQL - Query vs Perl
by wfsp (Abbot) on Aug 31, 2008 at 09:13 UTC
    My rule of thumb is if you've gone to all the trouble of getting a database connection (often the most expensive operation) you might as well use it. Let the db do all the heavy lifting, it doesn't mind.

    I often find that for reptitive data a long thin table wins over a short fat one. Everything gets easier. For what its worth my money would be on klassa's multi-row version

    Rather than update the data you could simply add a new "set" with a unique key, which, in another table could be related to, say, a user and a date. A retentive web admin would then have lots of historical data with which to bore the pants off everyone. (charts, graphs, top 10s, oooo!) :-)

    And if, say, you're using something like an HTML::Template loop to display your data you'll be looking for an array of hashes which the mighty DBI is happy to provide.

stooping lower
by juster (Friar) on Aug 31, 2008 at 09:26 UTC

    I'm probably making some assumptions here, just some random thoughts:

    If all the numbers are single digits what is the point of having commas between them?

    If all the digits are between 1 and 8 like you have written, they seem like octal! You could store 10 octal digits into an unsigned integer (4 bytes). Better than 1 digit every 1 byte! Really I just want to use octal for more than chmod...

      @juster

      I just really just leaving room for the idea that someday the data might consist of two digit numbers, or perhaps even more.

      @klassa

      The multi-row method is exactly what I need, just stored differently, and if I say archive that everyday, that will only leave me with 365,000 rows by the end of the year, which isn't bad at all.

      Thanks to everyone again for the quick responses!
        I just really just leaving room for the idea that someday the data might consist of two digit numbers, or perhaps even more.

        If you think they're going to stay within 255, you can pack them into a string, and then just store it in a char field. (not varchar -- no need for the extra overhead, as you know how many values there are going to be -- if the number of values change, you can then always alter the table

        Even leaving yourself two bytes per stored value is only going to be one character more than a comma-separated string is going to be. (and if you're using a varchar, that extra character overhead is moot)

        ...

        But that being said, there's a few other considerations that probably won't come out without benchmarking trying the single-record-per-user vs. the record-per-value approach --

        • How often are the values being updated?
        • Are the values needed singly, or a whole lot at a time?
        • Are the values updated singly, or many at the same time?

        And as you mention archiving -- you might want to consider that your archiving might not need to match your storage for normal use. Even if you store the data for normal use as a record per value, you can store it in a more compact format for long term archiving.

Re: MySQL - Query vs Perl
by NiJo (Friar) on Sep 01, 2008 at 21:33 UTC
    Have a rethink of your database design. It does not look normalised at all. Your 'mess' should vanish after splitting it up into several tables.

    Then it is time to read up on persistence, e. g. Class::DBI.