ff has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
Another strange-o that I can't quite find documentation for. Though the pointer to Inline::files may be the ticket, I want Perl to do the following on its own somehow! :-)

I have a program that requires package files containing subroutines. My main program invokes a subroutine from this package several times whose job it is to collect the stuff after <DATA> and return it to me via an array ref, something like:

package foo; use strict; sub build_data { my @ra = (); @ra = <DATA>; return (\@ra); } 1; __DATA__ DAY1 asas aerer DAY1 qwrep poiu wer DAY2 faas dfa sdf as

My problem is that only the first invocation of the subroutine reads the contents of <DATA>. Successive invocations return an empty array. My reading finds that closeing a filehandle prepares it to be read again from the top. But how do you close <DATA>? And setting $. = 1; (or maybe '0'?) doesn't do much either. Hmm.

My data is structured like this because it's easy to append more "data" to this package file and thus be available to the overall program without being in a separate text file. I have since rewritten this portion such that instead of appending new data that I generate from day to day to a __DATA__ section, it gets strategically inserted four lines before the end of the .pl package file where it effectively becomes CONSTANT data to be absorbed by an array of another subroutine of package foo, and thus as an array is returnable to my overall calling program. But lazy me, it irritated me to undo my __DATA__ approach. What did I miss?

Replies are listed 'Best First'.
Re: reading __DATA__ more than once
by japhy (Canon) on Nov 14, 2002 at 06:49 UTC
    I'd do the following:
    { my $offset; sub build_data { seek DATA, ($offset ||= tell DATA), 0; return [ <DATA> ]; } }

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Thank you! It hadn't occurred to me that the value landing in my $offset could persist through every time I entered the package to use its sub. Much less am I aware of the true workings of __DATA__ and using seek & tell to play with it. :-)

      As for "fast food", I'd say about 1% of the main program's data is accessed this way, 99% comes in traditional fat free ways. But "needs of the business" seem to require accessing this particular chunk of data in a more devious way... Are there side effects I should be aware of if I operate in this way? Besides its just being creepy?

        The value sticks around because I don't create in the function, but in the scope surrounding the function. If you've got Perl 5.6, here's a cooler way to do it:
        { my $offset; CHECK { $offset = tell DATA } sub data { seek DATA, $offset, 0; [<DATA>] } }
        The CHECK block happens just after compile-time ends.

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      This is a case for applying tye's best practices:
      { my $offset; BEGIN { $offset = tell DATA } sub build_data { seek DATA, $offset, 0; return [ <DATA> ]; } }

      That way, even if someone else touches DATA before build_data is called the first time (aka a race condition), it still produces the correct result.

      Update: Good point - I stand corrected.

      Makeshifts last the longest.

        You cannot use BEGIN there. See my response earlier, where I do a very similar solution. I use CHECK instead (because Perl has not seen the __DATA__ marker yet, if you use BEGIN).

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Re: reading __DATA__ more than once
by diotalevi (Canon) on Nov 14, 2002 at 06:13 UTC

    $offset = tell DATA and seek DATA, $offset, SEET_SET. It's just a filehandle - reset it's position when you need to.

    __SIG__ use B; printf "You are here %08x\n", unpack "L!", unpack "P4", pack "L!", B::svref_2object(sub{})->OUTSIDE;
      If you jump back to the beginning, just take care not to jump too far, because otherwise you will get the sourcecode of the script as "DATA", e.g. the following script prints its own sourcecode...
      #! /usr/bin/perl seek (DATA,0,0); print <DATA>; __DATA__ warn "Just another perl hacker\n";

      Best regards,
      perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

Re: reading __DATA__ more than once
by pg (Canon) on Nov 14, 2002 at 06:39 UTC
    DATA is free lunch from Perl. It is a file handler, and can be handled just as any other handler. However, I would still think it is a better practice to physically separate "data" from "process"/"logic". I would suggest to use __DATA__ only as fast food for testing your programs, but not keep in production code.
Re: reading __DATA__ more than once
by mirod (Canon) on Nov 14, 2002 at 07:29 UTC

    Tou might also want to use Inline::File if you want to use DATA as a file.

Re: reading __DATA__ more than once
by Ananda (Pilgrim) on Nov 14, 2002 at 09:56 UTC

    Just a related question ...

    Is is manditory to have all the __DATA__ at the bottom of the script.

    if it can be placed in the middle of script,

    how to differentiate the "__DATA__ content" form the program script?

    Can someone describe/elobrate/comment about the __DATA__ usage, its constraints and best practices..

    Thanks in advance

    Anandatirtha

      Basically, __DATA__ provides a quickie way to bundle data for a script right there at the bottom of the script, and make it available to you via the filehandle <DATA>, rather than have to (lazy-speak) put the data in a text file and make sure you always have access to that file, open it, etc. All text after a line containing '__DATA__ will be ignored in terms of executable code. __END__ functions similarly. Usually (?)this is for including test data for your script.

      In looking for online documentation about __DATA__, the best I could find was contained in a couple of paragraphs from perldata.html (using ActiveState) which in turn said "See the SelfLoader manpage for more description of __DATA__, and an example of its use." Not that said description was crystal clear... Good Luck! :-)