Tanktalus has asked for the wisdom of the Perl Monks concerning the following question:

This has to have been a problem someone has solved before ... I just can't find one on CPAN.

The concept is really quite simple. I want to be able to tell a repository object to handle a file, or a set of files. And then I want to retrieve it back - whether that's the same process or not. The repository location should be relatively opaque - today I may want it in ~/repository, tomorrow in a relational database, next week sitting on an FTP site, or maybe read-only from an HTTP site. Who knows. Much like how one might store their CGI::Sessions - today in /tmp, tomorrow in a Berkeley DB, next week in MySQL or DB2 or Oracle (well, if there were a CGI::Session::Oracle you could...). All you have to do is swap out the underlying driver, and you're using the new data store.

This is all probably a bit much for someone to whip up while on PM. That I'm ok with. I'm not looking for someone to send reams of code :-). I'm just trying to see if anything is already out there doing something like this. If not, then we'll have to do it ourselves, and I'll try to convince my manager to let me put it on CPAN. But, of course, if there's something we can just use, well, that's what perl and CPAN are for, isn't it?

Replies are listed 'Best First'.
Re: Generic repository of files
by BrowserUk (Patriarch) on Jan 26, 2005 at 00:07 UTC

    How about IO::All?

    It seems to cover most things:

    IO::All IO::All of it to Graham and Damian! 0.33 IO::All::DBM DBM Support for IO::All IO::All::Dir Directory Support for IO::All IO::All::File File Support for IO::All IO::All::Filesys File System Methods Mixin for IO::All IO::All::Link Symbolic Link Support for IO::All IO::All::MLDBM MLDBM Support for IO::All IO::All::Pipe Pipe Support for IO::All IO::All::STDIO STDIO Support for IO::All IO::All::Socket Socket Support for IO::All IO::All::String String IO Support for IO::All IO::All::Temp Temporary File Support for IO::All

    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Generic repository of files
by Aristotle (Chancellor) on Jan 25, 2005 at 22:54 UTC
Re: Generic repository of files
by davido (Cardinal) on Jan 25, 2005 at 23:49 UTC

    You could possibly use the DBI module / system for a level of abstraction (and for even more abstraction, subclass Class::DBI. This may be helpful to you because of the myriad DBD drivers designed to work with DBI, which include:

    • DBD::Anydata: DBI access to XML, CSV and other formats.
    • DBD::SQLite: A single-file, self-contained RDBMS in a DBI driver.
    • DBD::MySQL (and other more heavy-weight databases)

    My point is that if you start with the framework for using a database, you can keep it as generic as possible at the higher levels of abstraction, and with minimal changes, substitute any of the wide range of DBD database drivers.

    If you wish to layer DBI over flat files, you can start with DBD::Anydata. When it comes time to graduate to a real database, you can use DBD::SQLite, and when that doesn't keep you afloat you can switch to MySQL, PostgreSQL, or some other real database.


    Dave

      Interesting. I can just see it now: DBD::FTP, DBD::HTTP, ... ;-)

      I'm not actually interested in parsing or dealing with the internals of these files. The files would simply be BLOBs in a database. Think of this more as systems management - the task at hand is simply to store one or more files together so that they can be retrieved together. These files could be RPMs or the like (more like tarballs in my case), but the code I'm looking at is not actually interested in the contents of those tarballs, just an easy way to copy them into multiple locations - think, for a second, of the CPAN archive as an example. We want to add a bunch of tarballs to the repository, and then later pull them out - you want 10, I want 15, and between us a few are the same. No point in both of us going to the originator of those few tarballs to get them, we'll all go to the repository to get them. This gets even more important the more people there are, imagine being the owner of File::Spec or FindBin or other such modules and being bugged by every perl developer to get your modules. The repository takes care of all that work.

      Our current process goes and recreates the tarball each time it's needed. I want to nix that because we can be creating the same tarball 20+ times per day. A repository would, of course, help here. We'd create each tarball once, put it in the repository, and then extract it from there each time it's needed.

        Interesting. I can just see it now: DBD::FTP, DBD::HTTP, ... ;-)

        That's actually not a bad idea. The trick is how to make FTP and HTTP interactions into a SQL like dialect. For inspiration, take a look at DBD::Google and the more recently released DBD::iPod, or the older, and kinda funky DBD::Chart. And actually DBD's are not really as hard to write as you might think. There is plenty of good documentation, and solid implementations out there which you can work from. Personally I always look at DBD::mysqlPP whenever I need a reference, the code is really clean and very easy to read.

        -stvn
Re: Generic repository of files
by borisz (Canon) on Jan 26, 2005 at 00:22 UTC
Re: Generic repository of files
by saintmike (Vicar) on Jan 26, 2005 at 01:04 UTC
    The libferris project looks promising, and allegedly has a Perl wrapper.
Re: Generic repository of files
by Anonymous Monk on Jan 26, 2005 at 18:11 UTC
    Although this is not a Perl solution, I'm willing to bet that there are Perl interfaces to it. It sounds like possibly you want version control software, such as CVS or Subversion. I'm reasonably new to version control software, but I must say, it is very nice. It sounds like you may not need to take advantage of the versioning if nobody needs to change the files (but maybe you'll find a use for it).

    I've been using Subversion, so that is what I'll explain it with, but CVS is very similar.

    Basically, you can put a directory or files under version control, at which point they are uploaded into the repository (and yes, they do call it a respository, which is why your question made me think of it). You can allow access to the repository through SSH or through HTTP, and you can use the normal SSH or HTTP restrictions on read/write permissions, authenticaction, and domain restrictions. The people or programs that you want to be able to get these files would request the appropriate files. This could be done in preset groups by using directories (as in, they could request a whole directory to get the files in it), or they could request any subset of them that they want.

    Anyway, that's just a quick suggestion. It sounds like it may be what you're looking for, or it may not be. I hope it is helpful, though.

    ~Fatty

Re: Generic repository of files
by Tanktalus (Canon) on Sep 16, 2005 at 14:24 UTC

    Just in case anyone finds this node via Super Search (hahahaha... sorry.), I'll let everyone know that my manager *did* allow me to put Cache::Repository on CPAN. And it's working marvelously.

    What I've seen so far is that it has: cut down on the time to create all our packages (since 90% of our tarballs and associated files are common and would otherwise be rebuilt each time), and cut down on development time when our requirements changed on nearly zero-notice (the associated files just up and changed - we only had to change the "insert to repository" code, the "retreive from repository" code was unchanged).

    Currently, I only have a filesystem driver - but anyone who thinks this may be useful with other backends are more than welcome to develop other backends for it ;-)

    Just a reminder/clarification: this is designed to work with files. Not file contents. That's the significant difference between this and, say, the rest of the caching modules mentioned in this thread. Some of my files are 100's of MB in size - I don't really want to be copying that into memory to copy it back out somewhere else. I use enough memory as it is ;-)