blindluke has asked for the wisdom of the Perl Monks concerning the following question:

Enlightened Monks!

I'm storing a few descriptions within the code. Right now, it's done in a very basic way:

my %desc; $desc{'room'} =<<END; some text here, probably something a few lines long END $desc{'wall'} =<<END; another text here, probably something a few lines long [sometimes with braces] sometimes with a few paragraphs END

Honestly, I don't like it. I would much rather see it in the DATA section (and later in a file of its own), with the following (or similar) layout:

__DATA__ [room] some text here, probably something a few lines long [wall] another text here, probably something a few lines long [sometimes with braces] sometimes with a few paragraphs

Does anyone know of a Config:: module that would accept such syntax? Or a way to put something like multiple __DATA__ sections in the code, each with a name of their own?

I could parse such a section myself, with something like this:

my %desc; my $key = 'error'; for (<DATA>) { if (/^\[(\w+)\]\s*$/) { $key = $1; } else { $desc{$key} .= $_; } } die if defined $desc{'error'};

... but it seems worse than the initial solution, with direct assignments to the hash keys.

Still, maybe there is a more elegant way to parse such a DATA section.

UPDATE:

Can I use split with a regexp pattern, and use a capture within the pattern, to get the hash key?

I managed to find the answer to this one. The code below produces the desired %desc hash. But the grep/split combo looks only marginally better than the loop example above.

my $data=<<ENDS; [room] some text here, probably something a few lines long [wall] another text here, probably something a few lines long [sometimes with braces] sometimes with a few paragraphs ENDS %desc = grep {$_} (split (/\[(\w+)\]/, $data));

- Luke

Replies are listed 'Best First'.
Re: Storing multiple blocks of text in the __DATA__ section
by clueless newbie (Curate) on Jan 02, 2015 at 13:13 UTC

      Thanks, the module looks interesting.

      - Luke

Re: Storing multiple blocks of text in the __DATA__ section
by Athanasius (Archbishop) on Jan 02, 2015 at 13:40 UTC

      Now I have. Thank you for linking to the thread, I found this reply by Coruscate especially interesting.

      - Luke

Re: Storing multiple blocks of text in the __DATA__ section
by LanX (Saint) on Jan 02, 2015 at 16:11 UTC
    1. You should be aware that <<ENDS is dealt like <<"ENDS" i.e. allows variable interpolation. Better write <<'ENDS' to avoid this.

    2. you are free to use multiple here-docs in the same line, so if the number of blocks is too small for concerns about being DRY, you can write

    use Data::Dump; my %desc; init_desc(); dd \%desc; sub init_desc { @desc{ONE,TWO}= (<<'__ONE__',<<'__TWO__'); one __ONE__ two __TWO__ }

    3. Please note that I've put the population part away into a sub init_desc() , like this you can have multiple of such initializations hidden at the end of your code.

    4. Aforementioned solution isn't as DRY as you wanted, but actually your split-solution wasn't too bad, though your grep to ignore the first line is dangerous:

    my %desc = init_desc(); dd \%desc; sub init_desc { (undef,my %hash) = # ignore +first line split /^ \[ (\w+) \] \s* $/xm, <<'__ENDS__'; [ONE] one [TWO] two __ENDS__ return %hash; }

    (potential trimming of leading and trailing "\n" is left as an exercise).

    5. please note that the last approach can also be used to parse a slurped __DATA__ section.

    Cheers Rolf

    (addicted to the Perl Programming Language and ☆☆☆☆ :)

      Thank you for taking the time to reply, and providing all those suggestions.

      Ad 1: Noted. I expect that due to this behavior, even if I wanted interpolation, it would still be better ( in terms of style) to write <<"ENDS" to make the fact (that I am aware of this) explicit.

      Ad 2&3: Excellent stuff, I was not aware of this possibility (multiple here-docs in the same line). Thank you!

      Ad 4: Very interesting. I assume that the danger associated with grep is the fact that it will remove both undef (as intended) and anything that evaluates as non-true (could pose a problem). Is there any other danger involved? Not that it would matter much, as your solution is definitely better, but I'm just curious.

      Ad 5: Slurping __DATA__ is exactly what I had in mind when asking about possible split usage.

      - Luke

Re: Storing multiple blocks of text in the __DATA__ section
by gnosti (Chaplain) on Jan 02, 2015 at 20:38 UTC
Re: Storing multiple blocks of text in the __DATA__ section
by graff (Chancellor) on Jan 02, 2015 at 16:44 UTC
    Lot's of interesting stuff in this thread, but - oddly enough - I didn't see anyone mention the first thing that came to mind for me when I read the OP: use "paragraph mode" when reading from __DATA__, and have the various blocks of text separated by blank lines, with some simple, basic syntax that makes it easy to parse each block in a consistent way. Something like this:
    #!/usr/bin/perl use strict; use warnings; my %structure; { local $/ = ""; # input record separator = empty string for "parag +raph mode" while (<DATA>) { s/^(.*)\n//; # first line is key string $structure{$1} = $_; } } print "key: $_ / value:\n$structure{$_}\n----\n" for ( sort keys %stru +cture ); __DATA__ first_key Here's some data to go with the first key key_3 Third key gets this part key number 2 This element of %structure has spaces in the hash key.
    (Note that in this example the final new-lines are retained in the values.)

    UPDATED to localize the use of paragraph-mode.

      > but - oddly enough - I didn't see anyone mention ... use "paragraph mode"

      The OP wanted to allow multiple paragraphs in one section, and IMHO this isn't easily done with $/ .

      E.g using multiple newlines like in "\n\n" is a bit too error-prone and other separators would be part of the sections and needed to be filtered again.

      Cheers Rolf

      (addicted to the Perl Programming Language and ☆☆☆☆ :)

      update

      use Data::Dump; my %desc=init_data(); dd \%desc; sub init_data { my $sep = "\n=====\n"; local $/ = $sep; my %hash; while (<DATA>) { s/$sep$//; # kill separator s/^(.*)\n//; # first line is key string $hash{$1} = $_; } return %hash; } __DATA__ ONE one ===== TWO two two ===== THREE Three three
        Thanks(++) - I had missed that detail in the OP. As you pointed out in your update, it shouldn't be difficult to craft a record separator that's distinctive and easy to strip out. Alternately, it might not be so bad to "encode" record-internal blank lines in some distinctive and "easily decodable" manner - e.g.:
        $/ = ""; while (<DATA>) { s/^(.*)\n//; $key = $1; s/\n==(?=\n)/\n/g; $structure{$key} = $_; } __DATA__ key1 Here's a text block including blank lines ("encoded" as "==" in the pe +rl script): == and here's a part of the block that's enclosed within "blank lines" == and here's the last part of the value for key1. key2 blah blah etc.
        UPDATED to use the minimum necessary look-ahead, so that consecutive "blank lines" inside a record would be handled properly.
Re: Storing multiple blocks of text in the __DATA__ section
by LanX (Saint) on Jan 02, 2015 at 16:28 UTC

      I already use (and adore) Config::IniFiles, but it does not accept such simple syntax. It does, however accept multiline values for the params, but then the config would have to look like this:

      [general] Room=<<EOT A simple multiline text description EOT Wall=<<EOT Another multiline wall description With two paragraphs. EOT

      In recent versions of Config::IniFiles, you can specify a default section, so the first line of the above example could be omitted by doing:

      $cfg = Config::IniFiles->new( -file => *DATA, -default => "general" );

      Still, this is the same heredoc syntax which I was trying to avoid in the first place.

      Fortunately, gnosti has found the Data::Section::Simple module that seems to do exactly what I was searching for. His recommendation, and your excellent first reply, add to the reasons why I love our Monastery. Thank you.

      - Luke

Re: Storing multiple blocks of text in the __DATA__ section
by RMGir (Prior) on Jan 02, 2015 at 13:43 UTC
    This is very low-tech, and isn't quite what you're looking for, but it would let you concentrate all the config bits in __DATA__:
    #!/bin/env perl use strict; use warnings; use Data::Dumper; my $data = join "",<DATA>; my $config = eval "{$data}" or die "eval failed, $@"; print Dumper($config); __DATA__ foo => "This is foo's data" , bar => qq{this is bar's data it includes a newline and other stuff} , baz => { bazfoo => "baz is more complex" , bazbar => "it contains a sub-hash" }

    Mike

      Thanks for the reply and your time, but this solution just moves the variable assignments from within the code to the DATA section at the end of it.

      What would be the use of such a thing? My point was never moving the text to a specific place in my code, and placing it in __DATA__ was never an end unto itself.

      The point is making the code easier to read by putting the text descriptions as far away from the code syntax as possible. That way, someone can open the file, ignore all the Perl code, and edit the descriptions as any text document.

      - Luke

Re: Storing multiple blocks of text in the __DATA__ section
by thargas (Deacon) on Jan 02, 2015 at 15:24 UTC
Re: Storing multiple blocks of text in the __DATA__ section
by RonW (Parson) on Jan 06, 2015 at 00:04 UTC

    Since you indicated an interested in ultimately moving the data to a separate file, I suggest YAML::Tiny

    Using YAML::Tiny your data file would look like:

    room: > some text here, probably something a few lines long wall: | another text, here, this time pre-formatted (but must be indented)

    But, I'm not sure it will handle multiple paragraphs. More likely the pre-formatted syntax would because it uses indentation.

    That said, your parser for your syntax might just be the best choice for your application.