Text storage/retrieval

DreamT has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Text storage/retrieval by BrowserUk (Patriarch) on Mar 05, 2012 at 12:13 UTC
One idea is to use the __DATA__ block to store texts Why go through the process of having to convert text to a hash at runtime, everytime; and load all languages for every run. Why not just store them in a .pl. (One per language if you can know what language you are going to use in advance) file: text.EN `( "...", # 0 "...", # 1 ... );` [download] And then you can do: `my $lang = determineLang(); my @text = do "$lang.pl"; ... print $text[ 27 ];` [download] You can save a little more time by not even parsing the list, by using Storable, but that has a bad press with some people. `use Storable [thaw]; ... my $lang = determineLang(); my $text = thaw( "$lang.sto" ); ... print $text->[ 27 ];` [download] You would use a separate small app to build and write the binary storable files. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^2: Text storage/retrieval by tobyink (Canon) on Mar 05, 2012 at 13:39 UTC
Why go through the process of having to convert text to a hash at runtime, everytime; and load all languages for every run. Putting each language in its own file would certainly be an improvement - I grant you that. However, converting text to the hash at run time is actually faster than hardcoding the hash, at least for simple data. Yes, that's right. This: `my %hash; while (<DATA>) { my ($k, $v) = split /\t/o; $hash{$k} = $v; } __DATA__ 440035528809 6946395707444 332679554392 162874763688655 913537320343 56726180700920` [download] is faster than this: `my %hash = ( 440035528809=>'6946395707444', 332679554392=>'162874763688655', 913537320343=>'56726180700920', );` [download] Or at least it is once you've got more than a few hundred entries in the hash. It seems counter-intuitive, but it makes sense when you think about it. In the first example we're parsing a very simple text format using Perl (and Perl is very fast at text handling!); in the second we're parsing a programming language using C. Read more... (994 Bytes) I did quite a bit of benchmarking on this sort of thing for Crypt::XkcdPassword.	[reply] [d/l] [select]
Re^3: Text storage/retrieval by BrowserUk (Patriarch) on Mar 05, 2012 at 14:09 UTC
Now try it with phrases that can contain spaces and commas and quotes of either forms and even newlines? With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply]
Re^4: Text storage/retrieval by tobyink (Canon) on Mar 05, 2012 at 14:54 UTC
Re^2: Text storage/retrieval by DreamT (Pilgrim) on Mar 05, 2012 at 13:32 UTC
One aspect is the maintainability - it wouöd be great if the data could be stored in csv files or such. Any idea on that?	[reply]
Re^3: Text storage/retrieval by BrowserUk (Patriarch) on Mar 05, 2012 at 13:39 UTC
I see little difference in maintainablility between: `( "The quick", "brown fox", "jumps over", "the lazy", "dog", );` [download] And: `"The quick", "brown fox", "jumps over", "the lazy", "dog"` [download] But if you do, you could do the same thing -- put each language into a separate csv file -- and do: `my @text = someCSVparser( "$lang.csv" ); ...` [download] It'll be slower, but for 1500 strings, probably not enough to worry about. If performance is a concern -- as it seemed from your OP -- then you could store the texts in .csv files and use an off-line process to create the Storable form from them whenever they change. It has the advantage of ensuring that if the storable format shoudl ever change in incompatible ways -- it has happened in the past -- then you have the sources to fall back on. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]