Re: Is it possible to localize the stat/lstat cache?
by fishmonger (Chaplain) on Apr 17, 2015 at 15:22 UTC
|
| [reply] |
|
|
I hadn't taken a look at File::stat before. It looks like the version for Perl 5.12 has many of the features I'd be interested in, but unfortunately my code needs to work on 5.10 or higher. 5.10 doesn't support overloading of -X and I'd have to create my own cando function (or equivalent), so the extra effort involved in an attempt to use a standard module that doesn't do what I need it do is probably more than it's worth. I will remember it for later, though, once I can start assuming a newer Perl version.
| [reply] |
Re: Is it possible to localize the stat/lstat cache?
by jeffa (Bishop) on Apr 17, 2015 at 17:53 UTC
|
use strict;
use warnings;
my @files = glob('./*');
my %stat = map {
$_ => {
r => (-r $_),
w => (-w $_),
x => (-x $_),
s => (-s $_),
}
} @files;
# print out all sizes, as an example
print $stat{$_}{s}, $/ for @files;
Now you can call stat or lstat and still have a lookup table for cached values that you can always overwrite if you wish.
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] |
|
|
$ perl stat1.pl
files are ./causes2.txt ./fears1.pl ./fears1.pl~ ./fears2.txt ./stat1.
+pl ./stat1.pl~ ./template_stuff
240
282
242
63
396
362
4096
subroutine says this is your hash:
key: ./stat1.pl, value: HASH(0xa1519ac)
key: ./causes2.txt, value: HASH(0xa0fe7ec)
key: ./fears1.pl, value: HASH(0xa118598)
key: ./fears1.pl~, value: HASH(0xa117ddc)
key: ./fears2.txt, value: HASH(0xa12c59c)
key: ./stat1.pl~, value: HASH(0xa17581c)
key: ./template_stuff, value: HASH(0xa22a8d4)
$
Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it?
I really couldn't understand the map and resulting hash until I saw that the values were themselves hash references. I'm not suggesting that I added to your script in any way to improve it; rather it is simply more verbose:
$ cat stat1.pl
use strict;
use warnings;
use 5.010;
use lib "template_stuff";
use utils1 qw(print_hash);
my @files = glob('./*');
my %stat = map {
$_ => {
r => (-r $_),
w => (-w $_),
x => (-x $_),
s => (-s $_),
}
} @files;
say "files are @files";
# print out all sizes, as an example
print $stat{$_}{s}, $/ for @files;
my $hashref = \%stat;
print_hash ( $hashref );
$
Q2) Do I have it correct that the stat hash has an array reference as its value, where it references a hash with the letters for filetests as keys and their stat'ed values for any given file as values?
Q3) How would I enumerate them, that is, display all their values for a directory?
Thanks for your interesting post and comment, | [reply] [d/l] [select] |
|
|
subroutine says this is your hash:
key: ./stat1.pl, value: HASH(0xa1519ac)
Use Data::Dumper or similar to dump the hash content.
Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it?
They aren't. Directories on ext2/3/4 filesystems have a minimal size, 1 block, which is 4096 bytes on typical large filesystems. Smaller filesystems may use block sizes of 1024 or 2048. Directories filled with many files grow larger than one block. Removing the files will NOT make the directory shrink. Other filesystems may give completely different results. Unless you are writing low-level code to check, repair, or backup filesystems, it is best to completely ignore any size value for anything but plain files.
my %stat = map {
$_ => {
r => (-r $_),
w => (-w $_),
x => (-x $_),
s => (-s $_),
}
} @files;
Note that this code is not as efficient as it may seem. It hides four (l)stat calls per file, and so it may cause race conditions. To really reduce the number of (l)stat calls, use one explicit (l)stat and the special file handle _ instead of $_:
my %stat = map {
lstat($_) or die "Can't lstat $_: $!";
$_ => {
r => (-r _),
w => (-w _),
x => (-x _),
s => (-s _),
}
} @files;
fishmonger gave a much better hint: File::stat's stat and lstat functions both return an object that could be stored in the hash, allowing you to run all tests that you need without storing each tests result in the %stat hash:
use v5.12;
use File::stat 1.02 qw( stat lstat );
# ...
my %stat = map { $_ => lstat($_) } @files;
# ...
for my $fn (@files) {
say $fn,' is ',(-d $stat{$fn} ? 'a directory' : -x $stat{$fn} ? 'exe
+cutable' : 'not executable');
say $fn,' has a size of ',$stat{$fn}->size(),' bytes, uses ',$stat{$
+fn}->blocks(),' "blocks" of 512 bytes, the filesystem uses a block si
+ze of ',$stat{$fn}->blksize(),' bytes';
}
Update: Note that stat and lstat often return st_blocks for the historic block size of 512, even if the filesystem uses a different block size. This conforms to POSIX:
The unit for the st_blocks member of the stat structure is not defined within IEEE Std 1003.1-2001. In some implementations it is 512 bytes. It may differ on a file system basis. There is no correlation between values of the st_blocks and st_blksize, and the f_bsize (from <sys/statvfs.h>) structure members.
Traditionally, some implementations defined the multiplier for st_blocks in <sys/param.h> as the symbol DEV_BSIZE.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
I could do that, but we're talking about a very large number of files, so memory starts to become an issue in that case.
| [reply] |
Re: Is it possible to localize the stat/lstat cache?
by mr_mischief (Monsignor) on Apr 17, 2015 at 17:59 UTC
|
I like your version with %Stat (I wouldn't capitalize it in my style, but do what fits in your program's style of course).
A single call to stat though will return your mode, size, atime, mtime, ctime, and more. It's not from the current user's point of view like the -X functions, and it's not directly yes/no answers either. It may be handy to get all that information at once.
What I'd like to know is why you have an issue with building %Stat this way. Do you really want to localize the magical _ rather than put its results in the hash? What are you wanting that to buy you?
| [reply] [d/l] [select] |
|
|
It turns out that the use of %Stat is very similar to what File::stat does under the covers, but my version only does those tests that I know I will need (saving a few cycles of processor time, most likely). In my case (since I know in advance which tests I need to worry about), this is sufficient.
I don't have an issue with building %Stat this way. It actually is (almost) the best solution to my need. I like to know alternatives, though, and realized that there was no stat equivalent to local $_; or local *FileHandle; in Perl. This seems like an oversight to me, as it should be a best practice to localize any global variables that you use in a function (especially one that is part of a module) so that you don't have to worry about side effects for the caller.
As for my comment above about %Stat being almost the best solution, I teaked my solution to drop the hash entirely (which avoids construction of the hash object, looking up the key in the hash, etc.). I ended up using one variable per test that I needed. Since these variables were declared outside the function and set right after the call to stat/lstat, they had a one-time construction cost plus the assignment per file and then multiple uses. In my specific case, this was the most efficient and simple solution.
| [reply] [d/l] [select] |
|
|
| [reply] |
Re: Is it possible to localize the stat/lstat cache?
by AnomalousMonk (Archbishop) on Apr 17, 2015 at 15:28 UTC
|
The actual calls to these other functions are uncommon in frequency ..., but there are many places in the main function that might need to call them.
If the other functions are infrequently called, what difference does it really make how many places exist in the program from which the calls might be made? To give an extreme example, if half the statements in a program are die statements, but it can be shown that all these statements are in dead code branches, what cumulative effect do these die statements have? None.
Why worry about this? Or do I simply misunderstand the problem?
Give a man a fish: <%-(-(-(-<
| [reply] [d/l] [select] |
|
|
The problem is that the function needs to continue after these calls to the other functions. They are exception cases (in the sense that they are rare), but they aren't branches that don't need to return and continue from there.
For example (where ... could be any set of tests, including one or more -X _ tests, and each function call could potentially need to use stat/lstat in it):
if (...) { function1(); }
if (...) { function2(); }
if (...) { function3(); }
# etc.
| [reply] [d/l] [select] |
Re: Is it possible to localize the stat/lstat cache?
by bounsy (Acolyte) on Apr 17, 2015 at 17:51 UTC
|
I ended up just saving values in a hash, like what I thought I was going to do. In the future, I'll probably use File::stat.
| [reply] |