monks,

It is quite often I find myself with the need to keep chunks of text along with a module or script - not enough to warrant extra files, but enough to make using quotes and heredocs quite ugly. After reinventing my favourite simple solution (again), I found myself wanting to encapsulate the functionality in a module - but being put off by the simplicity of the implementation. I realise modules dont have to be magic, but still...

Cut to the chase - I wrote the module. I have included the start of some POD covering the general idea. Its all a bit rough'n'ready, but I would appreciate any opinions at this stage (or just tales of weird uses of __DATA__ you like to employ ;-).
I have not come across a similar solution on CPAN - but I am not really sure how such a module would be categorized (Tie::*, Text::* and Data::*)?.

My meditations is this:

Is there already a CPAN solution to this?
Would the module below be something you would consider useful? If not, why not?
Any ideas for a half-decent name?
Ive never written a module like this before. What is the best way to make modules that work around import() flexible/extensible?

cheers

edit: updated POD to match code posted later

NAME

Tie::DATA - access named data segments in __DATA__ handle via the package variable %DATA

SYNOPSIS

use Tie::DATA [[sub|scalar|regex], [sub|scalar]];

Simple Usage:

    use Tie::DATA;

    foreach(keys %DATA)
    {
        print "$_ = $DATA{$_}\n";
    }

    __DATA__
    
    __foo__

    yadda yadda yadda...

    __bar__

    ee-aye ee-aye oh

    __baz__
    
    woof woof
[download]

Intermediate Usage:

    use Tie::DATA(':xml'); # predefined format

    foreach(keys %DATA)
    {
        print "$_ = $DATA{$_}\n";
    }

    __DATA__
    
    <foo>yadda yadda yadda...</foo>
    <bar>
        ee-aye ee-aye oh
    </bar>
    <baz>
        woof woof
    </baz>
[download]

Custom Usage:

    use Tie::DATA ( 
        sub{ ... },  # parse key/values from DATA 
        sub{ ... }   # process pairs
    );

    ...

    __DATA__

    ...
[download]

DESCRIPTION

Tie::DATA provides a means to break a module or scripts' __DATA__ handle into named segments, accessible via the read-only package variable %DATA. Tie::DATA is not intended for configuration variables, but for medium-sized bodies of text that should be kept with the code (without being embedded in variable declarations).

%DATA's entries are created lazily; that is, when it is first used.

There are two stages to execution, both of which can be customized by arguments to use Tie::DATA

parsing

By default, Tie::DATA uses similar syntax as the __DATA__ token to seperate segments. Of course, what is a suitable seperator depends on the text being stored, so several likely defaults are provided:

:ini

    [foo]
    baz bar etc
[download]

:xml

    <foo>baz bar etc</foo>
[download]

:define

    #define foo
    baz bar etc
[download]

:cdata

    <![foo[
        baz bar etc
    ]]>
[download]

It is important to remember that by default, segments cannot be nested - in particular, :xml cannot have attributes.

Full customization of parsing can be gained by passing either a regex or sub reference as the first argument:

    use Tie::DATA qr(<<<<<<<<(\w*?)>>>>>>>>);

    use Tie::DATA sub{split(/\s*:SEGMENT\s+(\w+)\s*/, shift);}

    use Some::Mad::Parser;
    use Tie::DATA \&Some::Mad::Parser::parse;
[download]

The subroutine reference should return a list of key value pairs.

processing

After parsing, if a callback has been registered as the second argument, then each Key-Value pair is passed to the callback function for further processing. This function is expected to return the actual Key-Value pair that will be used in %DATA.

For example, if you wanted to control how whitespace was treated for each segment individually, you might use something like:

    use Tie::DATA(':ini', 'proc_kv');

    foreach(keys %DATA)
    {
        print "$_ = $DATA{$_}\n";
    }

    # our processing function, checks for 
    # and removes processing hints in our keys
    # (see __DATA__)
    sub proc_kv
    {
        my ($k, $v) = @_;

        if($k =~ /:/)
        {
            my ($tag, $hint) = split(/:/, $k);
            $k = $tag;

            if($hint eq 'nowhitespace')
            {
                $v = ...
            }
            else
            {
                $v = ...
            }
        }

        return($k,$v);
    }

    __DATA__
    
    [foo:nowhitespace]
    yadda yadda yadda...
    
    [bar]
    ee-aye ee-aye oh

    [baz]
    woof woof
[download]

There is no reason why the processing subroutine need be in the current module:

    use My::Big::Routine;
    use Tie::DATA(':ini', 'My::Big::Routine::go');

    foreach(keys %DATA)
    {
        print "$_ = $DATA{$_}\n";
    }
[download]

CAVEATS

%DATA is read-only. Any attempt to modify it after the processing stage will cause the program to croak.

time was, I could move my arms like a bird and...

In reply to getting more from __DATA__ by Ctrl-z

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.