in reply to fastest file processing Config file format

It depends a great deal on the data you need and how much of it there actually is. If 'some hundreds' is about 300 then almost any module that can be used for configuration and that suits your data criteria will turn the trick. If you need a lot more than that and your data is simple then a hand rolled solution may be what you need. Consider the following simple benchmark:

use warnings; use strict; use Benchmark qw(timethese cmpthese); use YAML qw(); use XML::Simple qw(); use Config::Fast qw(); my %bigHash = map {$_ => genStr ($_)} genKeys (1 .. 300); YAML::DumpFile ("delme.yaml", \%bigHash); open my $out, '>', "delme.xml" or die "Can't create demle.xml: $!\n"; print $out XML::Simple::XMLout (\%bigHash); close $out; open $out, '>', "delme.fast" or die "Can't create delme.fast: $!\n"; print $out "$_ $bigHash{$_}\n" for keys %bigHash; close $out; my $yamlHash = YAML::LoadFile ("delme.cfg"); my $fastHash = Config::Fast::fastconfig ("delme.fast"); my $xmlHash = XML::Simple::XMLin ("delme.xml"); my $slurpHash = {do {local @ARGV = "delme.fast"; my %newHash = map {sp +lit ' ', $_, 2} <>;}}; cmpthese (-1, { YAML => sub {my $newHash = YAML::LoadFile ("delme.cfg");}, fast => sub {my $newHash = Config::Fast::fastconfig ("delme.fast") +;}, XML => sub {my $newHash = XML::Simple::XMLin ("delme.xml");}, slurp => sub {local @ARGV = "delme.fast"; my %newHash = map {split + ' ', $_, 2} <>;} } ); sub genKeys { my @keys; for my $seed (@_) { push @keys, "x$seed"; } return @keys; } sub genStr { my ($key) = @_; return "Str " . ('x' x (substr ($key, 1) % 100)); }

Prints:

Rate YAML XML fast slurp YAML 21.0/s -- -8% -53% -98% XML 22.8/s 8% -- -50% -98% fast 45.2/s 115% 98% -- -96% slurp 1211/s 5665% 5216% 2583% --

True laziness is hard work

Replies are listed 'Best First'.
Re^2: fastest file processing Config file format
by duelafn (Parson) on Sep 29, 2009 at 13:06 UTC

    You seem to change file names between creating/testing the YAML parser (delme.yaml vs delme.cfg). Also, some of the C-based yaml parsers (for instance, YAML::Syck) are a lot faster than plain YAML. (removed Config::Fast since I don't have it installed):

    Rate YAML XML YAML_Syck slurp YAML 17.1/s -- -83% -97% -98% XML 103/s 505% -- -84% -88% YAML_Syck 636/s 3629% 516% -- -28% slurp 888/s 5107% 760% 40% --

    Good Day,
        Dean

      The .cfg was a carry over from previous iterations of the benchmark code where I only tested YAML. The file contents are the same as those generated for delme.yaml so the benchmark results are the same.

      I picked YAML first in the expectation that it would be the slowest of the bunch figuring that if you could meet the run time specification with YAML you could do it with anything.


      True laziness is hard work
Re^2: fastest file processing Config file format
by Tux (Canon) on Sep 29, 2009 at 15:17 UTC

    I agree 100% to the C-Solutions :). "split" is only reliable if the `key' cannot have spaces, CSV then might be a lot easier (and more portable)

    I also don't think 300 is a *big* config file.

    use strict; use warnings; use Benchmark qw(timethese cmpthese); use YAML qw(); use YAML::Syck qw(); use XML::Simple qw(); use Config::Fast qw(); use Text::CSV_XS; my $x = "x" x 100; my %bigHash = map { $_ => "Str ".substr $x, int rand 100 } map { "x$_" + } 0..3000; YAML::DumpFile ("delme.yml", \%bigHash); open my $out, ">", "delme.xml" or die "Can't create demle.xml: $!\n"; print $out XML::Simple::XMLout (\%bigHash); close $out; open $out, ">", "delme.fst" or die "Can't create delme.fst: $!\n"; print $out "$_ $bigHash{$_}\n" for keys %bigHash; close $out; my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\n" }); open $out, ">", "delme.csv" or die "Can't create delme.csv: $!\n"; $csv->print ($out, [ $_, $bigHash{$_} ]) for keys %bigHash; close $out; cmpthese ( -3, { YAML => sub { my $newHash = YAML::LoadFile ("delme.yml") +; }, Syck => sub { my $newHash = YAML::Syck::LoadFile ("delme.yml") +; }, fast => sub { my $newHash = Config::Fast::fastconfig ("delme.fst") +; }, XML => sub { my $newHash = XML::Simple::XMLin ("delme.xml") +; }, slurp => sub { local @ARGV = "delme.fst"; my %newHash = map { split " ", $_, 2 } <>; }, csv => sub { open my $fh, "<", "delme.csv"; my %newHash; while (my $row = $csv->getline ($fh)) { $newHash{$row->[0]} = $row->[1]; } }, });

    =>

    Rate YAML XML fast csv Syck slurp YAML 1.39/s -- -10% -54% -97% -98% -99% XML 1.54/s 11% -- -49% -97% -98% -99% fast 3.02/s 117% 96% -- -94% -96% -98% csv 47.2/s 3289% 2959% 1462% -- -43% -65% Syck 82.2/s 5804% 5228% 2622% 74% -- -39% slurp 136/s 9638% 8689% 4389% 187% 65% --

    Enjoy, Have FUN! H.Merijn

      I chose 300 for two reasons: it's a reasonable guess at what the OP might mean by 'some hundreds', and even the slowest configuration technique I tried meets the time criteria the OP gave for a configuration file of that size.

      The slurp solution wasn't intended as a reliable way to handle configuration information, but as an indicative upper limit for a Perl solution to the problem. It's interesting to note however that fastconfig uses the same file format and has the same potential issues as the slurp solution.


      True laziness is hard work