fastest file processing Config file format

Davewhite has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: fastest file processing Config file format by CountZero (Bishop) on Sep 29, 2009 at 05:47 UTC
The only way to make sure is to benchmark the different config-modules with real data and see how they compare. Much will depend on your data: the most simple format is a "`Key = Value`" format, which you can simply save in a text file and "slurp" back into an array without having to use a special module. Splitting the array-elements on `\s=\s` and saving into a hash will be very fast. But of course you can only deal with the simplest data: no embedded newlines or '=', everything ends up in one big hash, repeated keys get overwritten, no sub-key level, ... Update: when you benchmark the different config-modules, take care that the results are not skewed due to the effect of cache-ing the file data. You may notice that the first module you test takes always longest and that repeated reading of the same file is faster thereafter. What you see is the file not being read from your hard-disk but from a buffer. Without having tested it, I have a feeling that the major time in your config-file processing will anyhow be spent in reading the data in and that neither the config-file type nor the module used will have much effect, compared to the time needed to read the data. CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l] [select]
Re: fastest file processing Config file format by ikegami (Patriarch) on Sep 29, 2009 at 05:58 UTC
You've got a serious design problem is config loading speed is an issue. And the format shouldn't matter one bit. Perl is a much more complicated language than XML, YAML, JSON, INI, etc and it only took 14ms to parse 10,000 configuration items on my aging machine. It's the implementation that's going to matter.	[reply]
Re: fastest file processing Config file format by GrandFather (Saint) on Sep 29, 2009 at 11:03 UTC
It depends a great deal on the data you need and how much of it there actually is. If 'some hundreds' is about 300 then almost any module that can be used for configuration and that suits your data criteria will turn the trick. If you need a lot more than that and your data is simple then a hand rolled solution may be what you need. Consider the following simple benchmark: use warnings; use strict; use Benchmark qw(timethese cmpthese); use YAML qw(); use XML::Simple qw(); use Config::Fast qw(); my %bigHash = map {$_ => genStr ($_)} genKeys (1 .. 300); YAML::DumpFile ("delme.yaml", \%bigHash); open my $out, '>', "delme.xml" or die "Can't create demle.xml: $!\n"; print $out XML::Simple::XMLout (\%bigHash); close $out; open $out, '>', "delme.fast" or die "Can't create delme.fast: $!\n"; print $out "$_ $bigHash{$_}\n" for keys %bigHash; close $out; my $yamlHash = YAML::LoadFile ("delme.cfg"); my $fastHash = Config::Fast::fastconfig ("delme.fast"); my $xmlHash = XML::Simple::XMLin ("delme.xml"); my $slurpHash = {do {local @ARGV = "delme.fast"; my %newHash = map {sp +lit ' ', $_, 2} <>;}}; cmpthese (-1, { YAML => sub {my $newHash = YAML::LoadFile ("delme.cfg");}, fast => sub {my $newHash = Config::Fast::fastconfig ("delme.fast") +;}, XML => sub {my $newHash = XML::Simple::XMLin ("delme.xml");}, slurp => sub {local @ARGV = "delme.fast"; my %newHash = map {split + ' ', $_, 2} <>;} } ); sub genKeys { my @keys; for my $seed (@_) { push @keys, "x$seed"; } return @keys; } sub genStr { my ($key) = @_; return "Str " . ('x' x (substr ($key, 1) % 100)); } [download] Prints: `Rate YAML XML fast slurp YAML 21.0/s -- -8% -53% -98% XML 22.8/s 8% -- -50% -98% fast 45.2/s 115% 98% -- -96% slurp 1211/s 5665% 5216% 2583% --` [download] True laziness is hard work	[reply] [d/l] [select]
Re^2: fastest file processing Config file format by duelafn (Parson) on Sep 29, 2009 at 13:06 UTC
You seem to change file names between creating/testing the YAML parser (delme.yaml vs delme.cfg). Also, some of the C-based yaml parsers (for instance, YAML::Syck) are a lot faster than plain YAML. (removed Config::Fast since I don't have it installed): `Rate YAML XML YAML_Syck slurp YAML 17.1/s -- -83% -97% -98% XML 103/s 505% -- -84% -88% YAML_Syck 636/s 3629% 516% -- -28% slurp 888/s 5107% 760% 40% --` [download] Good Day, Dean	[reply] [d/l]
Re^3: fastest file processing Config file format by GrandFather (Saint) on Sep 29, 2009 at 19:05 UTC
The .cfg was a carry over from previous iterations of the benchmark code where I only tested YAML. The file contents are the same as those generated for delme.yaml so the benchmark results are the same. I picked YAML first in the expectation that it would be the slowest of the bunch figuring that if you could meet the run time specification with YAML you could do it with anything. True laziness is hard work	[reply]
Re^2: fastest file processing Config file format by Tux (Canon) on Sep 29, 2009 at 15:17 UTC
I agree 100% to the C-Solutions :). "split" is only reliable if the `key' cannot have spaces, CSV then might be a lot easier (and more portable) I also don't think 300 is a big config file. use strict; use warnings; use Benchmark qw(timethese cmpthese); use YAML qw(); use YAML::Syck qw(); use XML::Simple qw(); use Config::Fast qw(); use Text::CSV_XS; my $x = "x" x 100; my %bigHash = map { $_ => "Str ".substr $x, int rand 100 } map { "x$_" + } 0..3000; YAML::DumpFile ("delme.yml", \%bigHash); open my $out, ">", "delme.xml" or die "Can't create demle.xml: $!\n"; print $out XML::Simple::XMLout (\%bigHash); close $out; open $out, ">", "delme.fst" or die "Can't create delme.fst: $!\n"; print $out "$_ $bigHash{$_}\n" for keys %bigHash; close $out; my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\n" }); open $out, ">", "delme.csv" or die "Can't create delme.csv: $!\n"; $csv->print ($out, [ $_, $bigHash{$_} ]) for keys %bigHash; close $out; cmpthese ( -3, { YAML => sub { my $newHash = YAML::LoadFile ("delme.yml") +; }, Syck => sub { my $newHash = YAML::Syck::LoadFile ("delme.yml") +; }, fast => sub { my $newHash = Config::Fast::fastconfig ("delme.fst") +; }, XML => sub { my $newHash = XML::Simple::XMLin ("delme.xml") +; }, slurp => sub { local @ARGV = "delme.fst"; my %newHash = map { split " ", $_, 2 } <>; }, csv => sub { open my $fh, "<", "delme.csv"; my %newHash; while (my $row = $csv->getline ($fh)) { $newHash{$row->[0]} = $row->[1]; } }, }); [download] => `Rate YAML XML fast csv Syck slurp YAML 1.39/s -- -10% -54% -97% -98% -99% XML 1.54/s 11% -- -49% -97% -98% -99% fast 3.02/s 117% 96% -- -94% -96% -98% csv 47.2/s 3289% 2959% 1462% -- -43% -65% Syck 82.2/s 5804% 5228% 2622% 74% -- -39% slurp 136/s 9638% 8689% 4389% 187% 65% --` [download] Enjoy, Have FUN! H.Merijn	[reply] [d/l] [select]
Re^3: fastest file processing Config file format by GrandFather (Saint) on Sep 29, 2009 at 19:08 UTC
I chose 300 for two reasons: it's a reasonable guess at what the OP might mean by 'some hundreds', and even the slowest configuration technique I tried meets the time criteria the OP gave for a configuration file of that size. The slurp solution wasn't intended as a reliable way to handle configuration information, but as an indicative upper limit for a Perl solution to the problem. It's interesting to note however that fastconfig uses the same file format and has the same potential issues as the slurp solution. True laziness is hard work	[reply]
Re: fastest file processing Config file format by salva (Canon) on Sep 29, 2009 at 10:15 UTC
Use the format that best suits your data, and later if you find it is not fast enough, write some tool to convert it to a better format () or do it on the fly and cache the result. For instance, BSD systems do not read information directly from /etc/passwd but from a database file that is generated with pwd_mkdb. () a better format could be your config object serialized with Storable or saved with DB_File.	[reply]
Re: fastest file processing Config file format by JavaFan (Canon) on Sep 29, 2009 at 13:08 UTC
Without bothering to write a benchmark, my guess is that the fastest way is to write your config file as a Perl module - loading the data into variables directly. This means all the parsing is done by C, not Perl. And to be really, really fast, you use just a single scalar variable which stores all the configuration information. Of course, this has many disadvantages, but you're saying you want the fastest, and you haven't put down any other requirements.	[reply]
Re: fastest file processing Config file format by Anonymous Monk on Sep 29, 2009 at 05:39 UTC
Sounds myopic. If that is your only criteria, no module can approach `open my $fh, '<', 'config' or die $!; my @config = <$fh>; close $fh;` [download] Or if you want more complex data structures my $config = do 'filename' I suspect the best approach would be to benchmark	[reply] [d/l]