natol44 has asked for the wisdom of the Perl Monks concerning the following question:

Hello!

We are writing a script in perl for database management (non SQL, but text format). Let's say we have 100 ads, each ad is stored in a separated file and is composed of a line with data separated by /

For example
data1/data2/data3/.../data500


What would be better in speed to display the 100 ads on a page?

1: Storing all ads in ONE file, read this file and process the data? Like:
data1/data2/data3/.../data500
dataX1/dataX2/dataX3/.../dataX500
dataY1/dataY2/dataY3/.../dataY500


or

2: Storing the number of each ads in one file, reading this file, then opening each ad file (identified by its number)?

The disadvantage of the 1st way is that when an ad is modified or deleted, there are a few files to change: The file of the ad itself, the "all ads" file, the "last modified ads" file, etc. But maybe this way would be faster for displaying the 100 ads?

The disadvantage of the 2nd way (as I see it) is making a lot of disk access (one by ad). But ads updates would be more simple to do, as we would have only one file (the individual ad file) to update.

Which way (or a 3rd one) would you recommend? The website is expected to have a quite high traffic.


Thank you!
  • Comment on General question (speed, disk access etc)

Replies are listed 'Best First'.
Re: General question (speed, disk access etc)
by tilly (Archbishop) on Jan 18, 2011 at 16:49 UTC
    Let me see. You lack basic knowledge about filesystem access times, yet you think that you are in a position to reinvent a basic wheel. And you are confident enough in your ability that you expect to have "quite high traffic".

    There is a fairly obvious disconnect here.

    If you're just dealing with 100 ads, it doesn't much matter what you do since the operating system is going to keep everything in filesystem cache.

    If you have enough ads to blow the cache, reading 100 ads means 100 disk seeks, and disk seeks are expensive enough that this would take a full half second.

    If you have not blown your cache and you need to only read 10% of the ads, keeping them in one file means you're always reading 90% of the ads unnecessarily.

    Of course you are far from the first to face this problem. Even if you want to stay away from a regular database, BerkeleyDB gives you a choice of efficient solutions.

Re: General question (speed, disk access etc)
by oko1 (Deacon) on Jan 18, 2011 at 16:12 UTC

    > The website is expected to have a quite high traffic.

    In that case, I recommend you use a database. Trying to do this from a file is going to give you problems. Try 'nosql' for a tiny but useful implementation.

    -- 
    Education is not the filling of a pail, but the lighting of a fire.
     -- W. B. Yeats
Re: General question (speed, disk access etc)
by JavaFan (Canon) on Jan 18, 2011 at 16:25 UTC
    If the ads are just text, you only have 100 of them, and you aren't changing them every few seconds, your best option is probably to have them in memory (that is, read them in as the server starts).

    But realistically, you probably have some dynamic, or at least, changing content on your website (it's unlikely to have a high volume website nowadays where the only content that isn't static are the ads). So you already have some kind of solution. I'd see whether I could tie the ad system in. Or just outsource the ads to Google.

    But has any of this anything to do with Perl? You could go to a Python forum, and ask the same thing.

Re: General question (speed, disk access etc)
by locked_user sundialsvc4 (Abbot) on Jan 18, 2011 at 23:57 UTC

    And shouldn’t we, as they say, “club ya” for having anything-at-all to do with a website that is going to display 100 ads, anywhere at all?

      > And shouldn’t we, as they say, “club ya” for having anything-at-all to do with a website that is going to display 100 ads, anywhere at all?


      Thank you for your clever comment.

      In fact 100 was a (bad) example. There will be between 1000 and 1500 ads, an average 30 ads modified by day, and an average 20.000 unique visitors by day.

      Instead of a next clever (!) comment, recommendation about the howto would be fine :)

        1000-1500 ? then store them in memory, maybe shared.
        Modifications: you can create helper script that will merge all adds and refresh cache (or cache source) (put them together to all ads file, modify shared memory, ...)
        usefull cpan modules: Cache, Cache::Memory, Cache::Memcached, IPC::Shareable