Bod has asked for the wisdom of the Perl Monks concerning the following question:

We have a process that calls an API that returns JSON. The data doesn't change frequently and the JSON payload is reasonably large and has to be accessed in chunks.

Because we are calling it more frequently, I thought it would be a good idea to cache it to a text file and use that if it under a few hours old.

The module I'm using is internal, so I have complete control over where I save the cached data.

But, over dinner, whilst tucking into roast pork and sprouts (one of my favourite vegetables), I pondered how to approach this if it was a module I intended to publish on CPAN. This module will never be published, but I can imagine plenty of reasons for CPAN modules to need to write files just for their own use.

Is there a standard way to handle this?
Can we be sure that the module will have write permissions on the directory where it resides?
Is there an environment variable common to all environments for a temporary directory that can be used for this sort of thing?

Replies are listed 'Best First'.
Re: Where to save module data
by hippo (Archbishop) on Oct 08, 2024 at 21:40 UTC
    I thought it would be a good idea to cache it to a text file and use that if it under a few hours old.

    It certainly would be a good idea. However, if you'll permit me to chip in with the results of my experience, I caution against writing such a caching system yourself. I have done this more than once and have also inherited code where others have done this more than once. In every single case the caching system has drawbacks - one or two are serious, all are irritating.

    In Perl we are fortunate indeed to have a very high-quality but generic and flexible cache module available to use in the shape of CHI and it would be hard to praise it too highly. Do try it out in your (apparently fairly simple) use case. You should find it easy to use and very efficient and hopefully that will encourage you to use it for other, more involved, cache scenarios.

    Is there a standard way to handle this?

    To quote AST: The nice thing about standards is that you have so many of them to choose from.


    🦛

      if you'll permit me to chip in with the results of my experience...

      I am always eager to hear hippo experience nuggets!

      In Perl we are fortunate indeed to have a very high-quality but generic and flexible cache module available to use in the shape of CHI and it would be hard to praise it too highly.

      Thank you hippo - until now, this module has been unknown to me.

      Briefly looking at it, I can immediately see its value and will try it out with a simple use case...

        CHI does look like the way to go. But its File driver defaults to /tmp/chi-driver-file (or OS equivalent), and it sounds to me like your data would more appropriately be in /var/cache/your-module-name.

        Filesystem Hierarchy Standard:
        /var/cache is intended for cached data from applications. Such data is locally generated as a result of time-consuming I/O or calculation. The application must be able to regenerate or restore the data. Unlike /var/spool, the cached files can be deleted without data loss. The data must remain valid between invocations of the application and rebooting the system. Files located under /var/cache may be expired in an application specific manner, by the system administrator, or both. The application must always be able to recover from manual deletion of these files (generally because of a disk space shortage). No other requirements are made on the data format of the cache directories.
        You'd want to specify a location for Windows, too. A little googling didn't find any close equivalent to /var/cache for non-FHS systems, so if /var/cache doesn't exist, you'd likely just want to use something under File::Spec->tmpdir or let CHI default.
        --
        A math joke: r = | |csc(θ)|+|sec(θ)| |-| |csc(θ)|-|sec(θ)| |

        I just thought I'd add a comment on CHI. I hadn't ever heard of it
        either and it seems like a valuable module to know about if you need to
        store/cache data. Happily, I report that it installs without drama on
        cygwinPerl; lots of dependencies though. IMHO however, this is generally
        a good thing. It shows that the author has thought carefully about
        her/his code and researched thoroughly.

        Oct 19, 2024 at 22:30 UTC
        Examine what is said, not who speaks.
        Love the truth but pardon error.
        Silence betokens consent.
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Where to save module data (2 updates)
by choroba (Cardinal) on Oct 08, 2024 at 19:26 UTC
    File::Spec->tmpdir or Path::Tiny->tempdir?

    Update: Or probably not as you want to access it from different runs of the program.

    Update 2: The module can take the directory as a parameter. Each user will decide where to store the data for their application if they want to use it.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      And File::Temp uses File::Spec if you want to use that to create a temporary file.
Re: Where to save module data
by etj (Priest) on Oct 08, 2024 at 19:47 UTC
    This may be a gap in the market:
    • permanently-available files are best handled with File::ShareDir::Install and are read-only
    • ephemeral ones with File::Temp owned by app-process owner; need not survive a reboot
    Your situation is probably closer to ephemeral, both in terms of lifespan, and appropriate file-ownership. But my assumption is you'd want to have the files survive a reboot? This situation is like a file-based, age-limited memoisation. You might consider using Redis also.
      But my assumption is you'd want to have the files survive a reboot?

      For my current purpose...no! It is on a webserver which pretty much never gets rebooted.

      But, of course, for the hypothetical CPAN module, surviving a reboot would be preferable so it could be used anywhere Perl can exist.

Re: Where to save module data
by stevieb (Canon) on Oct 09, 2024 at 15:25 UTC

    If the module is called under a service user, or it's ok for each user to have their own copy, I use the user's home directory for this. Here's an incomplete/pseudo example:

    use File::HomeDir; my $home_dir = File::HomeDir->my_home; my $data_file = "$home_dir/api_data.json"; if (! -e $data_file || time() - (stat($data_file))[9] > 600) { # File doesn't exist or is older than 10 minutes ...get/update the file } ...do stuff with file

    The module won't have access to the file/directory, but the user calling it will.

    Update: This doesn't just work on Unix/MacOS. It also works on Windows. I use it in berrybrew.

Re: Where to save module data
by Danny (Chaplain) on Oct 08, 2024 at 22:31 UTC
    Is there any way for your API to tell you if something about the data has changed? For example, the size or the timestamp? That would seem more optimal. If not, since you say you are grabbing it in chunks, could you just grab, say, the chunk at some offset X near the end and compare it to the last analogous chunk you saved from the previous ping?
Re: Where to save module data
by Intrepid (Curate) on Oct 18, 2024 at 20:49 UTC

    Bod wrote:

    But, over dinner, whilst tucking into roast pork and sprouts (one of my favourite vegetables), I pondered how to approach this if it was a module I intended to publish on CPAN. This module will never be published, but I can imagine plenty of reasons for CPAN modules to need to write files just for their own use.

    Is there a standard way to handle this?
    Can we be sure that the module will have write permissions on the directory where it resides?
    Is there an environment variable common to all environments for a temporary directory that can be used for this sort of thing?

    Despite that fact that I hate and loath sprouts, I am going to reply ;-)

    By the time I post this reply you will undoubtedly have received several good suggestions, Bod. What I wanted to add is that there is a standard for such things. It goes by the term XDG Base Directory Specification - cited already by ysth in Re^3: Where to save module data. All flavors of Linux (probably) adhere to it, and the perl module File::HomeDir employs it, as well providing for equivalents in the Windows world.

    The specification linked to above will answer several of your questions. Note that, with File::Homedir, only two of the methods provided will actually create directories for you (given the appropriate args). The other methods return undef if the cited directory does not already exist.

    Oct 18, 2024 at 19:37 UTC
    The open palm of desire
    Wants everything, it wants everything
    It wants soil as soft as summer
    And the strength to push like spring
    Paul Simon -> Further to Fly
      Despite that fact that I hate and loath sprouts, I am going to reply ;-)

      When I was young, I too hated sprouts...
      The older I get, the more I like them...to the point that I have become so old that they are one of my favourite vegetables, along with artichokes!

      What I wanted to add is that there is a standard for such things

      Thank you...

      That is precisely what my sprout inspired curiosity was looking for 👍

Re: Where to save module data
by sectokia (Friar) on Oct 14, 2024 at 02:24 UTC

    Never assume the data is 'too big' to just keep in memory. Always give the user the option of not writing anything.

    The 'standard way' (ie. what most people do) is to write to systems temp folder, where the OS can clean it up later. So I would probably use File::Temp.

Re: Where to save module data
by harangzsolt33 (Deacon) on Oct 08, 2024 at 21:07 UTC