monks,

It is quite often I find myself with the need to keep chunks of text along with a module or script - not enough to warrant extra files, but enough to make using quotes and heredocs quite ugly. After reinventing my favourite simple solution (again), I found myself wanting to encapsulate the functionality in a module - but being put off by the simplicity of the implementation. I realise modules dont have to be magic, but still...

Cut to the chase - I wrote the module. I have included the start of some POD covering the general idea. Its all a bit rough'n'ready, but I would appreciate any opinions at this stage (or just tales of weird uses of __DATA__ you like to employ ;-).
I have not come across a similar solution on CPAN - but I am not really sure how such a module would be categorized (Tie::*, Text::* and Data::*)?.

My meditations is this:

cheers

edit: updated POD to match code posted later


NAME

Tie::DATA - access named data segments in __DATA__ handle via the package variable %DATA

SYNOPSIS

use Tie::DATA [[sub|scalar|regex], [sub|scalar]];

Simple Usage:

use Tie::DATA; foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } __DATA__ __foo__ yadda yadda yadda... __bar__ ee-aye ee-aye oh __baz__ woof woof

Intermediate Usage:

use Tie::DATA(':xml'); # predefined format foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } __DATA__ <foo>yadda yadda yadda...</foo> <bar> ee-aye ee-aye oh </bar> <baz> woof woof </baz>

Custom Usage:

use Tie::DATA ( sub{ ... }, # parse key/values from DATA sub{ ... } # process pairs ); ... __DATA__ ...

DESCRIPTION

Tie::DATA provides a means to break a module or scripts' __DATA__ handle into named segments, accessible via the read-only package variable %DATA. Tie::DATA is not intended for configuration variables, but for medium-sized bodies of text that should be kept with the code (without being embedded in variable declarations).

%DATA's entries are created lazily; that is, when it is first used.

There are two stages to execution, both of which can be customized by arguments to use Tie::DATA

parsing

By default, Tie::DATA uses similar syntax as the __DATA__ token to seperate segments. Of course, what is a suitable seperator depends on the text being stored, so several likely defaults are provided:
:ini
[foo] baz bar etc
:xml
<foo>baz bar etc</foo>
:define
#define foo baz bar etc
:cdata
<![foo[ baz bar etc ]]>

It is important to remember that by default, segments cannot be nested - in particular, :xml cannot have attributes.

Full customization of parsing can be gained by passing either a regex or sub reference as the first argument:

use Tie::DATA qr(<<<<<<<<(\w*?)>>>>>>>>); use Tie::DATA sub{split(/\s*:SEGMENT\s+(\w+)\s*/, shift);} use Some::Mad::Parser; use Tie::DATA \&Some::Mad::Parser::parse;
The subroutine reference should return a list of key value pairs.

processing

After parsing, if a callback has been registered as the second argument, then each Key-Value pair is passed to the callback function for further processing. This function is expected to return the actual Key-Value pair that will be used in %DATA.

For example, if you wanted to control how whitespace was treated for each segment individually, you might use something like:

use Tie::DATA(':ini', 'proc_kv'); foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; } # our processing function, checks for # and removes processing hints in our keys # (see __DATA__) sub proc_kv { my ($k, $v) = @_; if($k =~ /:/) { my ($tag, $hint) = split(/:/, $k); $k = $tag; if($hint eq 'nowhitespace') { $v = ... } else { $v = ... } } return($k,$v); } __DATA__ [foo:nowhitespace] yadda yadda yadda... [bar] ee-aye ee-aye oh [baz] woof woof
There is no reason why the processing subroutine need be in the current module:
use My::Big::Routine; use Tie::DATA(':ini', 'My::Big::Routine::go'); foreach(keys %DATA) { print "$_ = $DATA{$_}\n"; }

CAVEATS

%DATA is read-only. Any attempt to modify it after the processing stage will cause the program to croak.




time was, I could move my arms like a bird and...

In reply to getting more from __DATA__ by Ctrl-z

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.