in reply to Re: Caching files
in thread Caching files

I'm on Linux and there is a cron process that writes at regular times the time window of validity of some forecast data for a number of geographical tiles.
Each tile has a corresponding JSON written by the cron process where, for example, its time window is written.
The data generated by the process are 4D arrays saved to ASCII files with regular structure, so that value(i,j,k,t) data are queried with direct access with seek(), upon calculation of byte start of value(i,j,k,t), through a function that has (i,j,k,t) as input.
At the moment, the process consists in looking for the tile where a given point falls and then read the JSON of such tile when making the query.
I wonder if there is a way to preload all the JSON files into a hash, and then update them when they change upon cron process execution.
Here below are parts of code, so that maybe it is possible to understand the situation. In practice, I'd like to cache the sub _get_tile_info()


get_data();

#-------------
sub get_data {

    my %args = @_;
    my @coords = @{$args{Coords} || []};

    my @tiles_and_ids = _get_tile_and_ids(%args);

    foreach my $point (@tiles_and_ids) {
        my $data = _extract_ts(Point=>$point,WS2D=>1,WD2D=>1,TEMP2D=>1);
    }

}

#----------------------
sub _get_tile_and_ids {

    my %args = @_;

    my @coords = @{$args{Coords} || []};

    foreach my $pair (@coords_lonlat)

        my ($status,$tile,$ii,$jj) = _find_tile_and_ids(X=>$xx,Y=>$yy);
        push @results,$tile,$ii,$jj;

    }

    return \@results;

}

#-----------------------
sub _find_tile_and_ids {

    my %args = @_;

    my $x = $args{X};
    my $x = $args{Y};

    # Find tile

    ....

    my ($icell,$jcell) = _find_cell(X=>$x,Y=>$y,Xmin=>$xll_tile,Ymin=>$yll_tile,Dxy=>$info{dxy});

    return('',$tile,$icell,$jcell);

}

#----------------------
sub _find_cell

    my %args = @_;

    my $x = $args{X};
    my $y = $args{Y};
    my $xmin = $args{Xmin};
    my $ymin = $args{Ymin};
    my $dxy = $args{Dxy};

    my $ii = floor(($x - $xmin) / $dxy) + 1;
    my $jj = floor(($y - $ymin) / $dxy) + 1;

    return ($ii,$jj);

}

#----------------
sub _extract_ts {

    my %args = @_;
    my $point = $args{Point} || die;

    my ($tile,$ii,$jj) = ($point->1,$point->2,$point->3)
    my %file = (
        WS2D => "$tile/ws3d.dat",
        TEMP2D => "$tile/temp3d.dat",
    );

    my $tile_info = _get_tile_info(Tile=>$tile);

    ....

}

#-------------------
sub _get_tile_info {

    my %args = @_;
    my $tile = $args{Tile};

    my $json_file = "$tile/info.json";
    my $tile_info = read_file($json_file);

    return $tile_info;



Replies are listed 'Best First'.
Re^3: Caching files
by choroba (Cardinal) on Jan 24, 2020 at 16:49 UTC
    OK, we still miss some of the details, but let's have some fun.

    I created a Makefile like this:

    Now, you can run

    make simulate_cron
    to generate the input data and start modifying them randomly.

    Then, run

    make query
    in a different terminal. The Perl program is the following:
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Cpanel::JSON::XS qw{ decode_json }; my %cache; for (1 .. 1000) { my @queries = map [ map int 1 + rand 10, 1, 2 ], 1 .. 50; for my $query (@queries) { my ($x, $y) = @$query; # delete $cache{$x}{$y}; # <- Uncomment to simulate no cache. my $value; if (exists $cache{$x}{$y} && (stat "$x-$y.json")[9] == $cache{$x}{$y}{last} ) { $value = $cache{$x}{$y}{value}; } else { open my $in, '<', "$x-$y.json" or die $!; $cache{$x}{$y}{last} = (stat $in)[9]; $value = $cache{$x}{$y}{value} = decode_json(do { local $/ +; <$in> })->[2]; } say "$x, $y: $value"; } }

    With the delete line uncommented, it takes about 0.400s to terminate. With the line commented, it runs under 0.100s, i.e. slightly more than 4 times faster.

    Notes:

    1. The simulation uses mv to create the JSON files so they change is atomic. If we wrote to the file directly instead, we could get occasional errors when reading it.
    2. We store the modification time before we read the value. There's a race condition: the value may change after we retrieved the modification time, but before we read the value. But it doesn't break the code: we return the correct value, but we might read it from the file once more next time.
    3. I guess the cron process doesn't change all the files all the time, so the real benefit of this kind of cache might be much lesser in your real environment.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Thanks. The JSON update would be inside a module that prepares the data for an API implemented with Mojolicious.
      Is there a way I can preserve this cache for subsequent and independent queries to the API? Would it be possible to specify that %cache is a global variable in the application?