Hence my "at least for simple data". My example uses tabs and newlines as delimiters, so those characters cannot appear in the data.

That said, using something like JSON::XS you can get even faster than either of the previous two examples. And of course JSON gives you escaping, multiline strings, etc.

use strict; use JSON; my %hash = %{from_json(do{ local $/ = <DATA> })}; __DATA__ { "440035528809":"6946395707444", "332679554392":"162874763688655", "913537320343":"56726180700920" }

With a hash of 500_000 entries, I get:

Standard Perl hash...
11.04user 0.31system 0:11.58elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+41519minor)pagefaults 0swaps
Reading TSV from __DATA__...
6.15user 0.14system 0:06.38elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+13860minor)pagefaults 0swaps
Reading JSON from __DATA__...
4.25user 0.26system 0:04.64elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+38709minor)pagefaults 0swaps

Of course, loading the JSON module introduces some overhead, so on smaller datasets the other techniques beat it. With a hash of 1000 entries, I get:

Standard Perl hash...
0.03user 0.00system 0:00.04elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+629minor)pagefaults 0swaps
Reading TSV from __DATA__...
0.01user 0.00system 0:00.02elapsed 92%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+566minor)pagefaults 0swaps
Reading JSON from __DATA__...
0.10user 0.00system 0:00.11elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+871minor)pagefaults 0swaps

It seems to be at around the 5000 hash entry mark that JSON::XS starts winning over a hard-coded Perl hash, and around 12000 hash entries it starts winning over tab-delimited data.

My benchmarking code is:

use 5.010; open my $perl, '>', 'perl.pl'; open my $data, '>', 'data.pl'; open my $json, '>', 'json.pl'; print $perl <<'CODE'; use strict; my %hash = ( CODE print $data <<'CODE'; use strict; my %hash; while (<DATA>) { my ($k, $v) = split /\t/o; $hash{$k} = $v; } __DATA__ CODE print $json <<'CODE'; use strict; use JSON; my %hash = %{from_json(do{ local $/ = <DATA> })}; __DATA__ { CODE my $last = 100_000; for (1 .. $last) { my $k = int rand 1_000_000_000_000; my $v = int rand 1_000_000_000_000_000; my $comma = $_==$last?'':','; print $perl "$k=>'$v',\n"; print $data "$k\t$v\t\n"; print $json "\"$k\":\"$v\"$comma\n"; } print $perl <<'CODE'; ); CODE print $json <<'CODE'; } CODE close $perl; close $json; close $data; say "Standard Perl hash..."; system("time perl perl.pl"); say "Reading TSV from __DATA__..."; system("time perl data.pl"); say "Reading JSON from __DATA__..."; system("time perl json.pl"); unlink "perl.pl"; unlink "data.pl"; unlink "json.pl";

This example doesn't include any newline characters in the data, but for the JSON::XS approach embedded newlines in the string (if properly escaped according to JSON syntax) don't seem to make a significant difference to performance.


In reply to Re^4: Text storage/retrieval by tobyink
in thread Text storage/retrieval by DreamT

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.