Is it possible to localize the stat/lstat cache?

bounsy has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Is it possible to localize the stat/lstat cache? by fishmonger (Chaplain) on Apr 17, 2015 at 15:22 UTC
Have you looked at File::stat?	[reply]
Re^2: Is it possible to localize the stat/lstat cache? by bounsy (Acolyte) on Apr 17, 2015 at 17:34 UTC
I hadn't taken a look at File::stat before. It looks like the version for Perl 5.12 has many of the features I'd be interested in, but unfortunately my code needs to work on 5.10 or higher. 5.10 doesn't support overloading of -X and I'd have to create my own cando function (or equivalent), so the extra effort involved in an attempt to use a standard module that doesn't do what I need it do is probably more than it's worth. I will remember it for later, though, once I can start assuming a newer Perl version.	[reply]
Re: Is it possible to localize the stat/lstat cache? by jeffa (Bishop) on Apr 17, 2015 at 17:53 UTC
Why not store the results based on the files themselves? `use strict; use warnings; my @files = glob('./*'); my %stat = map { $_ => { r => (-r $_), w => (-w $_), x => (-x $_), s => (-s $_), } } @files; # print out all sizes, as an example print $stat{$_}{s}, $/ for @files;` [download] Now you can call stat or lstat and still have a lookup table for cached values that you can always overwrite if you wish. jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
Re^2: Is it possible to localize the stat/lstat cache? by Aldebaran (Curate) on Apr 18, 2015 at 06:10 UTC
Hi jeff, When I come to this site with some spare time, I try to work through some script that stretches my game a little bit. I had to add print statements to figure out your syntax but wanted to ask for clarification. $ perl stat1.pl files are ./causes2.txt ./fears1.pl ./fears1.pl~ ./fears2.txt ./stat1. +pl ./stat1.pl~ ./template_stuff 240 282 242 63 396 362 4096 subroutine says this is your hash: key: ./stat1.pl, value: HASH(0xa1519ac) key: ./causes2.txt, value: HASH(0xa0fe7ec) key: ./fears1.pl, value: HASH(0xa118598) key: ./fears1.pl~, value: HASH(0xa117ddc) key: ./fears2.txt, value: HASH(0xa12c59c) key: ./stat1.pl~, value: HASH(0xa17581c) key: ./template_stuff, value: HASH(0xa22a8d4) $ [download] Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it? I really couldn't understand the map and resulting hash until I saw that the values were themselves hash references. I'm not suggesting that I added to your script in any way to improve it; rather it is simply more verbose: `$ cat stat1.pl use strict; use warnings; use 5.010; use lib "template_stuff"; use utils1 qw(print_hash); my @files = glob('./*'); my %stat = map { $_ => { r => (-r $_), w => (-w $_), x => (-x $_), s => (-s $_), } } @files; say "files are @files"; # print out all sizes, as an example print $stat{$_}{s}, $/ for @files; my $hashref = \%stat; print_hash ( $hashref ); $` [download] Q2) Do I have it correct that the stat hash has an array reference as its value, where it references a hash with the letters for filetests as keys and their stat'ed values for any given file as values? Q3) How would I enumerate them, that is, display all their values for a directory? Thanks for your interesting post and comment,	[reply] [d/l] [select]
Re^3: Is it possible to localize the stat/lstat cache? by afoken (Chancellor) on Apr 18, 2015 at 06:47 UTC
`subroutine says this is your hash: key: ./stat1.pl, value: HASH(0xa1519ac)` [download] Use Data::Dumper or similar to dump the hash content. Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it? They aren't. Directories on ext2/3/4 filesystems have a minimal size, 1 block, which is 4096 bytes on typical large filesystems. Smaller filesystems may use block sizes of 1024 or 2048. Directories filled with many files grow larger than one block. Removing the files will NOT make the directory shrink. Other filesystems may give completely different results. Unless you are writing low-level code to check, repair, or backup filesystems, it is best to completely ignore any size value for anything but plain files. `my %stat = map { $_ => { r => (-r $_), w => (-w $_), x => (-x $_), s => (-s $_), } } @files;` [download] Note that this code is not as efficient as it may seem. It hides four `(l)stat` calls per file, and so it may cause race conditions. To really reduce the number of `(l)stat` calls, use one explicit `(l)stat` and the special file handle `_` instead of `$_`: `my %stat = map { lstat($_) or die "Can't lstat $_: $!"; $_ => { r => (-r _), w => (-w _), x => (-x _), s => (-s _), } } @files;` [download] fishmonger gave a much better hint: File::stat's `stat` and `lstat` functions both return an object that could be stored in the hash, allowing you to run all tests that you need without storing each tests result in the `%stat` hash: `use v5.12; use File::stat 1.02 qw( stat lstat ); # ... my %stat = map { $_ => lstat($_) } @files; # ... for my $fn (@files) { say $fn,' is ',(-d $stat{$fn} ? 'a directory' : -x $stat{$fn} ? 'exe +cutable' : 'not executable'); say $fn,' has a size of ',$stat{$fn}->size(),' bytes, uses ',$stat{$ +fn}->blocks(),' "blocks" of 512 bytes, the filesystem uses a block si +ze of ',$stat{$fn}->blksize(),' bytes'; }` [download] Update: Note that `stat` and `lstat` often return `st_blocks` for the historic block size of 512, even if the filesystem uses a different block size. This conforms to POSIX: The unit for the st_blocks member of the `stat` structure is not defined within IEEE Std 1003.1-2001. In some implementations it is 512 bytes. It may differ on a file system basis. There is no correlation between values of the `st_blocks` and `st_blksize`, and the `f_bsize` (from `<sys/statvfs.h>`) structure members. Traditionally, some implementations defined the multiplier for `st_block`s in `<sys/param.h>` as the symbol `DEV_BSIZE`. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^4: Is it possible to localize the stat/lstat cache? by Aldebaran (Curate) on Apr 18, 2015 at 23:22 UTC
Re^5: Is it possible to localize the stat/lstat cache? by afoken (Chancellor) on Apr 19, 2015 at 00:44 UTC
Re^5: Is it possible to localize the stat/lstat cache? by AnomalousMonk (Archbishop) on Apr 19, 2015 at 00:32 UTC
Re^2: Is it possible to localize the stat/lstat cache? by Anonymous Monk on Apr 20, 2015 at 12:19 UTC
I could do that, but we're talking about a very large number of files, so memory starts to become an issue in that case.	[reply]
Re: Is it possible to localize the stat/lstat cache? by mr_mischief (Monsignor) on Apr 17, 2015 at 17:59 UTC
I like your version with `%Stat` (I wouldn't capitalize it in my style, but do what fits in your program's style of course). A single call to stat though will return your mode, size, atime, mtime, ctime, and more. It's not from the current user's point of view like the -X functions, and it's not directly yes/no answers either. It may be handy to get all that information at once. What I'd like to know is why you have an issue with building `%Stat` this way. Do you really want to localize the magical `_` rather than put its results in the hash? What are you wanting that to buy you?	[reply] [d/l] [select]
Re^2: Is it possible to localize the stat/lstat cache? by Anonymous Monk on Apr 20, 2015 at 12:34 UTC
It turns out that the use of %Stat is very similar to what File::stat does under the covers, but my version only does those tests that I know I will need (saving a few cycles of processor time, most likely). In my case (since I know in advance which tests I need to worry about), this is sufficient. I don't have an issue with building %Stat this way. It actually is (almost) the best solution to my need. I like to know alternatives, though, and realized that there was no stat equivalent to `local $_;` or `local FileHandle;` in Perl. This seems like an oversight to me, as it should be a best practice to localize any global variables that you use in a function (especially one that is part of a module) so that you don't have to worry about side effects for the caller. As for my comment above about %Stat being almost* the best solution, I teaked my solution to drop the hash entirely (which avoids construction of the hash object, looking up the key in the hash, etc.). I ended up using one variable per test that I needed. Since these variables were declared outside the function and set right after the call to stat/lstat, they had a one-time construction cost plus the assignment per file and then multiple uses. In my specific case, this was the most efficient and simple solution.	[reply] [d/l] [select]
Re^3: Is it possible to localize the stat/lstat cache? by bounsy (Acolyte) on Apr 20, 2015 at 12:40 UTC
Ooops. Wasn't logged in when I said that.	[reply]
Re: Is it possible to localize the stat/lstat cache? by AnomalousMonk (Archbishop) on Apr 17, 2015 at 15:28 UTC
The actual calls to these other functions are uncommon in frequency ..., but there are many places in the main function that might need to call them. If the other functions are infrequently called, what difference does it really make how many places exist in the program from which the calls might be made? To give an extreme example, if half the statements in a program are die statements, but it can be shown that all these statements are in dead code branches, what cumulative effect do these `die` statements have? None. Why worry about this? Or do I simply misunderstand the problem? Give a man a fish: `<%-(-(-(-<`	[reply] [d/l] [select]
Re^2: Is it possible to localize the stat/lstat cache? by bounsy (Acolyte) on Apr 17, 2015 at 15:42 UTC
The problem is that the function needs to continue after these calls to the other functions. They are exception cases (in the sense that they are rare), but they aren't branches that don't need to return and continue from there. For example (where ... could be any set of tests, including one or more `-X _` tests, and each function call could potentially need to use stat/lstat in it): `if (...) { function1(); } if (...) { function2(); } if (...) { function3(); } # etc.` [download]	[reply] [d/l] [select]
Re: Is it possible to localize the stat/lstat cache? by bounsy (Acolyte) on Apr 17, 2015 at 17:51 UTC
I ended up just saving values in a hash, like what I thought I was going to do. In the future, I'll probably use File::stat.	[reply]