Davewhite has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I have to create a config file containing some hundreds of common config parameters, used across multiple perl scripts.

The constraint is to chose a config file format ( XML, text etc), which is fastest to read from perl scripts (max 45 millisec time permissible for file processing).

Any guidance / suggestion for the format or approach is welcomed.

Thanks & Regards

Dave

  • Comment on fastest file processing Config file format

Replies are listed 'Best First'.
Re: fastest file processing Config file format
by CountZero (Bishop) on Sep 29, 2009 at 05:47 UTC
    The only way to make sure is to benchmark the different config-modules with real data and see how they compare.

    Much will depend on your data: the most simple format is a "Key = Value" format, which you can simply save in a text file and "slurp" back into an array without having to use a special module.

    Splitting the array-elements on \s*=\s* and saving into a hash will be very fast. But of course you can only deal with the simplest data: no embedded newlines or '=', everything ends up in one big hash, repeated keys get overwritten, no sub-key level, ...

    Update: when you benchmark the different config-modules, take care that the results are not skewed due to the effect of cache-ing the file data. You may notice that the first module you test takes always longest and that repeated reading of the same file is faster thereafter. What you see is the file not being read from your hard-disk but from a buffer.

    Without having tested it, I have a feeling that the major time in your config-file processing will anyhow be spent in reading the data in and that neither the config-file type nor the module used will have much effect, compared to the time needed to read the data.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: fastest file processing Config file format
by ikegami (Patriarch) on Sep 29, 2009 at 05:58 UTC

    You've got a serious design problem is config loading speed is an issue.

    And the format shouldn't matter one bit. Perl is a much more complicated language than XML, YAML, JSON, INI, etc and it only took 14ms to parse 10,000 configuration items on my aging machine. It's the implementation that's going to matter.

Re: fastest file processing Config file format
by GrandFather (Saint) on Sep 29, 2009 at 11:03 UTC

    It depends a great deal on the data you need and how much of it there actually is. If 'some hundreds' is about 300 then almost any module that can be used for configuration and that suits your data criteria will turn the trick. If you need a lot more than that and your data is simple then a hand rolled solution may be what you need. Consider the following simple benchmark:

    use warnings; use strict; use Benchmark qw(timethese cmpthese); use YAML qw(); use XML::Simple qw(); use Config::Fast qw(); my %bigHash = map {$_ => genStr ($_)} genKeys (1 .. 300); YAML::DumpFile ("delme.yaml", \%bigHash); open my $out, '>', "delme.xml" or die "Can't create demle.xml: $!\n"; print $out XML::Simple::XMLout (\%bigHash); close $out; open $out, '>', "delme.fast" or die "Can't create delme.fast: $!\n"; print $out "$_ $bigHash{$_}\n" for keys %bigHash; close $out; my $yamlHash = YAML::LoadFile ("delme.cfg"); my $fastHash = Config::Fast::fastconfig ("delme.fast"); my $xmlHash = XML::Simple::XMLin ("delme.xml"); my $slurpHash = {do {local @ARGV = "delme.fast"; my %newHash = map {sp +lit ' ', $_, 2} <>;}}; cmpthese (-1, { YAML => sub {my $newHash = YAML::LoadFile ("delme.cfg");}, fast => sub {my $newHash = Config::Fast::fastconfig ("delme.fast") +;}, XML => sub {my $newHash = XML::Simple::XMLin ("delme.xml");}, slurp => sub {local @ARGV = "delme.fast"; my %newHash = map {split + ' ', $_, 2} <>;} } ); sub genKeys { my @keys; for my $seed (@_) { push @keys, "x$seed"; } return @keys; } sub genStr { my ($key) = @_; return "Str " . ('x' x (substr ($key, 1) % 100)); }

    Prints:

    Rate YAML XML fast slurp YAML 21.0/s -- -8% -53% -98% XML 22.8/s 8% -- -50% -98% fast 45.2/s 115% 98% -- -96% slurp 1211/s 5665% 5216% 2583% --

    True laziness is hard work

      You seem to change file names between creating/testing the YAML parser (delme.yaml vs delme.cfg). Also, some of the C-based yaml parsers (for instance, YAML::Syck) are a lot faster than plain YAML. (removed Config::Fast since I don't have it installed):

      Rate YAML XML YAML_Syck slurp YAML 17.1/s -- -83% -97% -98% XML 103/s 505% -- -84% -88% YAML_Syck 636/s 3629% 516% -- -28% slurp 888/s 5107% 760% 40% --

      Good Day,
          Dean

        The .cfg was a carry over from previous iterations of the benchmark code where I only tested YAML. The file contents are the same as those generated for delme.yaml so the benchmark results are the same.

        I picked YAML first in the expectation that it would be the slowest of the bunch figuring that if you could meet the run time specification with YAML you could do it with anything.


        True laziness is hard work

      I agree 100% to the C-Solutions :). "split" is only reliable if the `key' cannot have spaces, CSV then might be a lot easier (and more portable)

      I also don't think 300 is a *big* config file.

      use strict; use warnings; use Benchmark qw(timethese cmpthese); use YAML qw(); use YAML::Syck qw(); use XML::Simple qw(); use Config::Fast qw(); use Text::CSV_XS; my $x = "x" x 100; my %bigHash = map { $_ => "Str ".substr $x, int rand 100 } map { "x$_" + } 0..3000; YAML::DumpFile ("delme.yml", \%bigHash); open my $out, ">", "delme.xml" or die "Can't create demle.xml: $!\n"; print $out XML::Simple::XMLout (\%bigHash); close $out; open $out, ">", "delme.fst" or die "Can't create delme.fst: $!\n"; print $out "$_ $bigHash{$_}\n" for keys %bigHash; close $out; my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\n" }); open $out, ">", "delme.csv" or die "Can't create delme.csv: $!\n"; $csv->print ($out, [ $_, $bigHash{$_} ]) for keys %bigHash; close $out; cmpthese ( -3, { YAML => sub { my $newHash = YAML::LoadFile ("delme.yml") +; }, Syck => sub { my $newHash = YAML::Syck::LoadFile ("delme.yml") +; }, fast => sub { my $newHash = Config::Fast::fastconfig ("delme.fst") +; }, XML => sub { my $newHash = XML::Simple::XMLin ("delme.xml") +; }, slurp => sub { local @ARGV = "delme.fst"; my %newHash = map { split " ", $_, 2 } <>; }, csv => sub { open my $fh, "<", "delme.csv"; my %newHash; while (my $row = $csv->getline ($fh)) { $newHash{$row->[0]} = $row->[1]; } }, });

      =>

      Rate YAML XML fast csv Syck slurp YAML 1.39/s -- -10% -54% -97% -98% -99% XML 1.54/s 11% -- -49% -97% -98% -99% fast 3.02/s 117% 96% -- -94% -96% -98% csv 47.2/s 3289% 2959% 1462% -- -43% -65% Syck 82.2/s 5804% 5228% 2622% 74% -- -39% slurp 136/s 9638% 8689% 4389% 187% 65% --

      Enjoy, Have FUN! H.Merijn

        I chose 300 for two reasons: it's a reasonable guess at what the OP might mean by 'some hundreds', and even the slowest configuration technique I tried meets the time criteria the OP gave for a configuration file of that size.

        The slurp solution wasn't intended as a reliable way to handle configuration information, but as an indicative upper limit for a Perl solution to the problem. It's interesting to note however that fastconfig uses the same file format and has the same potential issues as the slurp solution.


        True laziness is hard work
Re: fastest file processing Config file format
by salva (Canon) on Sep 29, 2009 at 10:15 UTC
    Use the format that best suits your data, and later if you find it is not fast enough, write some tool to convert it to a better format (*) or do it on the fly and cache the result.

    For instance, BSD systems do not read information directly from /etc/passwd but from a database file that is generated with pwd_mkdb.

    (*) a better format could be your config object serialized with Storable or saved with DB_File.

Re: fastest file processing Config file format
by JavaFan (Canon) on Sep 29, 2009 at 13:08 UTC
    Without bothering to write a benchmark, my guess is that the fastest way is to write your config file as a Perl module - loading the data into variables directly. This means all the parsing is done by C, not Perl. And to be really, really fast, you use just a single scalar variable which stores all the configuration information.

    Of course, this has many disadvantages, but you're saying you want the fastest, and you haven't put down any other requirements.

Re: fastest file processing Config file format
by Anonymous Monk on Sep 29, 2009 at 05:39 UTC
    Sounds myopic. If that is your only criteria, no module can approach
    open my $fh, '<', 'config' or die $!; my @config = <$fh>; close $fh;
    Or if you want more complex data structures
    my $config = do 'filename'
    I suspect the best approach would be to benchmark