Popcorn Dave has asked for the wisdom of the Perl Monks concerning the following question:

I'm currently re-writing a program that I wrote for a Perl class which pulls newspaper headlines from a number of different web sites and my thought is to just keep the information in a __DATA__ area at the end of the program. The data consists of a number, a url and a graphic. The number is the rule that the data from HTML::Parser will be processed with.

I'm curious as to what concensus is here regarding putting data in __DATA__ as opposed to building, reading and parsing a config file. Is there an advantage to one over the other?

Thanks for any input on this!

There is no emoticon for what I'm feeling now.

Replies are listed 'Best First'.
Re: config file vs. __DATA__
by MarkM (Curate) on Apr 23, 2003 at 04:36 UTC

    One especially odd factor to consider, is that Perl handles __DATA__ by retaining an open file handle that is used when later read from. This can be quite efficient -- for a single program that contains a single __DATA__ section, the beginning of the __DATA__ section may already be in memory, and no file needs to be open()'d to start reading from it.

    __DATA__ sections should rarely or never be used in Perl modules unless you absolutely know what you are doing, and even then, reconsider. If every module had a __DATA__ section, and few of them used it immediately, Perl scripts would quickly use up all available file descriptors on the system. Also, using __DATA__ sections may make the module less portable or harder to store in an alternative format such as PAR. I highly doubt __DATA__ sections work properly for compiled Perl.

    Personally, I prefer to store 'data' outside the program, as it allows the data to be accessed easily from multiple sources, and not just the executing (Perl) program.

Re: config file vs. __DATA__
by dws (Chancellor) on Apr 23, 2003 at 03:47 UTC
    I'm curious as to what concensus is here regarding putting data in __DATA__ as opposed to building, reading and parsing a config file.

    I'll use __DATA__ for personal stuff and demos, but if there's any risk of something escaping from the laboratory, it gets a proper configuration file (or external template file, or whatever).

    __DATA__ is too deep into a file to expect casual people to look. And it's the casual people who create the biggest support burden.

Re: config file vs. __DATA__
by Abigail-II (Bishop) on Apr 23, 2003 at 07:34 UTC
    Don't do whatever the "concensus" is of the at most few dozen replies you get here. Do whatever you feel comfortable with. If __DATA__ works for you, hey, all the more power to you. Perl is about making more than one way available to you. Don't be a sheep and let others decide, decide yourself.

    Abigail

      Abigail is right. I use __DATA__ if I can get away with it and it won't make my life difficult later. Most of the times I can't since the scripts I write tend to be installed and reinstalled on different machines requiring different configurations, and I don't want to edit the script each time I install. You may even be able to use both methods at the same time in some cases (i.e. use a config file if it exists, or use __DATA__ if it doesn't).
      Doesn't that generally defeat the point of asking for advice? :)

      I don't know, I've found there's something to be said for doing what others tend to do, "best practices" and all that. I think this "individualism" (doing something differently just for the sake of not being a "sheep") contributes to Perl's (partially earned) rep for being hard to collaborate on and hard to maintain.

        No, I wouldn't have said it if the poster asked "what do you do, and why?". But the poster was asking for concensus, leaving me with the feeling (s)he'd go with whatever got the most votes.

        Abigail

Re: config file vs. __DATA__
by vek (Prior) on Apr 23, 2003 at 05:48 UTC
    As a general rule of thumb I don't hardcode anything that could be user configurable or is likely to change on a regular basis. All program configuration is stored away in a config file. YMMV on this but this is a policy that is enforced here at work and seems to work out quite well for us to be honest.

    In my shop we have a development group and an application support group. The application support group are not programmers but are quite technically savvy and are responsible for program configuration. It's quite nice for them to be able to modify a config file if there are any minor tweaks needed (i.e a different directory, a different URL, maintaining an email distro list etc). That way we don't have to release a new version of the program when one of these minor tweaks is needed.

    Config files start making a lot of sense if you have quite a strict ticketing->development->QA->release process like we do here at work. If some of those configuration type items were hardcoded, it might take one or two weeks just to get a new release out.

    As I said, YMMV but config files seem to work out rather nicely for my development group.

    -- vek --
Re: config file vs. __DATA__
by zby (Vicar) on Apr 23, 2003 at 07:25 UTC
    I can't see any use for __DATA__ exept for some quick hacks. Generally you rather need to divide the program to smaller chunks for better management of the complexity (this is not a theoretical rule but rather something I've learned from practice). So when a piece of logic/data can be so easily extracted from the program that you put it in the __DATA__ area I'd rather put it in a separate file. And you can reverse the extraction with PAR as allready mentioned.
Re: config file vs. __DATA__
by Solo (Deacon) on Apr 23, 2003 at 20:43 UTC
    Why choose?

    Config::IniFiles does most of the heavy lifting of config files for you. And you can work with your config file in __DATA__ while programming or debugging and easily switch to another file for release. See Re: My Code Is Functional...But Not Tidy :( for an example of using __DATA__ with Config::IniFiles.

    --Solo

    --
    Chewie... angle the rear deflector shield.

Re: config file vs. __DATA__
by Your Mother (Archbishop) on Apr 24, 2003 at 02:25 UTC
    I have used it in a couple of CGIs where non-programmers were going to maintain variables in the script. Something like:
    __DATA__ Department Fish Gun Licensing Manager Trout Fishing in America Email trout@fish.net Department...
    In the situation I was in, removing the config from the program would eventually lead to problems since the "maintainers" of the script wouldn't be able to handle it (I realize an admin feature in the script would have been even easier for users and allow the config to be separated out, but that adds at least a couple hundred lines of code and the project size/scope didn't warrant that kind of time).