DreamT has asked for the wisdom of the Perl Monks concerning the following question:

Hi,
For my application, I'm currently storing text phrases in a database, with the following columns:

ID Text_EN Text_DE Text_<language suffix> ....

When the application starts, I fetch the texts for the current language in a hash, with the id column as key. For obvious reasons I'd like to do it in a more effective way. Using fastCGI solves the problem in that I only need to do this in the first call, storing the hash in memory for fast access. But, I'd like to find an alternative way for storing/retrieving the texts (FastCGI, memcached etc. aren't always available in the target environments).

One idea is to use the __DATA__ block to store texts, would this be a good solution? Are there better ways? There are about 1500 phrases stored today, per language.

Replies are listed 'Best First'.
Re: Text storage/retrieval
by BrowserUk (Patriarch) on Mar 05, 2012 at 12:13 UTC
    One idea is to use the __DATA__ block to store texts

    Why go through the process of having to convert text to a hash at runtime, everytime; and load all languages for every run.

    • Why not just store them in a .pl. (One per language if you can know what language you are going to use in advance) file:

      text.EN

      ( "...", # 0 "...", # 1 ... );

      And then you can do:

      my $lang = determineLang(); my @text = do "$lang.pl"; ... print $text[ 27 ];
    • You can save a little more time by not even parsing the list, by using Storable, but that has a bad press with some people.
      use Storable [thaw]; ... my $lang = determineLang(); my $text = thaw( "$lang.sto" ); ... print $text->[ 27 ];

      You would use a separate small app to build and write the binary storable files.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Why go through the process of having to convert text to a hash at runtime, everytime; and load all languages for every run.

      Putting each language in its own file would certainly be an improvement - I grant you that.

      However, converting text to the hash at run time is actually *faster* than hardcoding the hash, at least for simple data.

      Yes, that's right. This:

      my %hash; while (<DATA>) { my ($k, $v) = split /\t/o; $hash{$k} = $v; } __DATA__ 440035528809 6946395707444 332679554392 162874763688655 913537320343 56726180700920

      is faster than this:

      my %hash = ( 440035528809=>'6946395707444', 332679554392=>'162874763688655', 913537320343=>'56726180700920', );

      Or at least it is once you've got more than a few hundred entries in the hash.

      It seems counter-intuitive, but it makes sense when you think about it. In the first example we're parsing a very simple text format using Perl (and Perl is very fast at text handling!); in the second we're parsing a programming language using C.

      I did quite a bit of benchmarking on this sort of thing for Crypt::XkcdPassword.

        Now try it with phrases that can contain spaces and commas and quotes of either forms and even newlines?


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      One aspect is the maintainability - it wouöd be great if the data could be stored in csv files or such. Any idea on that?

        I see little difference in maintainablility between:

        ( "The quick", "brown fox", "jumps over", "the lazy", "dog", );

        And:

        "The quick", "brown fox", "jumps over", "the lazy", "dog"

        But if you do, you could do the same thing -- put each language into a separate csv file -- and do:

        my @text = someCSVparser( "$lang.csv" ); ...

        It'll be slower, but for 1500 strings, probably not enough to worry about.

        If performance is a concern -- as it seemed from your OP -- then you could store the texts in .csv files and use an off-line process to create the Storable form from them whenever they change. It has the advantage of ensuring that if the storable format shoudl ever change in incompatible ways -- it has happened in the past -- then you have the sources to fall back on.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?