in reply to Re: Is it possible to localize the stat/lstat cache?
in thread Is it possible to localize the stat/lstat cache?

Hi jeff,

When I come to this site with some spare time, I try to work through some script that stretches my game a little bit. I had to add print statements to figure out your syntax but wanted to ask for clarification.

$ perl stat1.pl files are ./causes2.txt ./fears1.pl ./fears1.pl~ ./fears2.txt ./stat1. +pl ./stat1.pl~ ./template_stuff 240 282 242 63 396 362 4096 subroutine says this is your hash: key: ./stat1.pl, value: HASH(0xa1519ac) key: ./causes2.txt, value: HASH(0xa0fe7ec) key: ./fears1.pl, value: HASH(0xa118598) key: ./fears1.pl~, value: HASH(0xa117ddc) key: ./fears2.txt, value: HASH(0xa12c59c) key: ./stat1.pl~, value: HASH(0xa17581c) key: ./template_stuff, value: HASH(0xa22a8d4) $

Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it?

I really couldn't understand the map and resulting hash until I saw that the values were themselves hash references. I'm not suggesting that I added to your script in any way to improve it; rather it is simply more verbose:

$ cat stat1.pl use strict; use warnings; use 5.010; use lib "template_stuff"; use utils1 qw(print_hash); my @files = glob('./*'); my %stat = map { $_ => { r => (-r $_), w => (-w $_), x => (-x $_), s => (-s $_), } } @files; say "files are @files"; # print out all sizes, as an example print $stat{$_}{s}, $/ for @files; my $hashref = \%stat; print_hash ( $hashref ); $

Q2) Do I have it correct that the stat hash has an array reference as its value, where it references a hash with the letters for filetests as keys and their stat'ed values for any given file as values?

Q3) How would I enumerate them, that is, display all their values for a directory?

Thanks for your interesting post and comment,

Replies are listed 'Best First'.
Re^3: Is it possible to localize the stat/lstat cache?
by afoken (Chancellor) on Apr 18, 2015 at 06:47 UTC
    subroutine says this is your hash: key: ./stat1.pl, value: HASH(0xa1519ac)

    Use Data::Dumper or similar to dump the hash content.

    Q1) Why are directories always 4096 on my linux machine, regardless of whatever is in it?

    They aren't. Directories on ext2/3/4 filesystems have a minimal size, 1 block, which is 4096 bytes on typical large filesystems. Smaller filesystems may use block sizes of 1024 or 2048. Directories filled with many files grow larger than one block. Removing the files will NOT make the directory shrink. Other filesystems may give completely different results. Unless you are writing low-level code to check, repair, or backup filesystems, it is best to completely ignore any size value for anything but plain files.

    my %stat = map { $_ => { r => (-r $_), w => (-w $_), x => (-x $_), s => (-s $_), } } @files;

    Note that this code is not as efficient as it may seem. It hides four (l)stat calls per file, and so it may cause race conditions. To really reduce the number of (l)stat calls, use one explicit (l)stat and the special file handle _ instead of $_:

    my %stat = map { lstat($_) or die "Can't lstat $_: $!"; $_ => { r => (-r _), w => (-w _), x => (-x _), s => (-s _), } } @files;

    fishmonger gave a much better hint: File::stat's stat and lstat functions both return an object that could be stored in the hash, allowing you to run all tests that you need without storing each tests result in the %stat hash:

    use v5.12; use File::stat 1.02 qw( stat lstat ); # ... my %stat = map { $_ => lstat($_) } @files; # ... for my $fn (@files) { say $fn,' is ',(-d $stat{$fn} ? 'a directory' : -x $stat{$fn} ? 'exe +cutable' : 'not executable'); say $fn,' has a size of ',$stat{$fn}->size(),' bytes, uses ',$stat{$ +fn}->blocks(),' "blocks" of 512 bytes, the filesystem uses a block si +ze of ',$stat{$fn}->blksize(),' bytes'; }

    Update: Note that stat and lstat often return st_blocks for the historic block size of 512, even if the filesystem uses a different block size. This conforms to POSIX:

    The unit for the st_blocks member of the stat structure is not defined within IEEE Std 1003.1-2001. In some implementations it is 512 bytes. It may differ on a file system basis. There is no correlation between values of the st_blocks and st_blksize, and the f_bsize (from <sys/statvfs.h>) structure members.

    Traditionally, some implementations defined the multiplier for st_blocks in <sys/param.h> as the symbol DEV_BSIZE.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thanks for your comments and scripts, alexander. This might be a sheer banality to those with greater expereince and understanding than me, but this material is right at where I can tread the learning curve. Data::Dumper truly makes this version pretty (abridged for length):

      $ ./stat3.pl $VAR1 = { './stat1.pl' => { 'w' => 1, 'r' => 1, 'x' => '', 's' => 393 }, './causes2.txt' => { 'w' => 1, 'r' => 1, 'x' => '', 's' => 299 }, ... }, './stat3.pl' => { 'w' => 1, 'r' => 1, 'x' => 1, 's' => 293 }, './template_stuff' => { 'w' => 1, 'r' => 1, 'x' => 1, 's' => 4096 }, }; $ cat stat3.pl #!/usr/bin/perl -w use strict; use v5.12; use Data::Dumper; my @files = glob('./*'); my %stat = map { lstat($_) or die "Can't lstat $_: $!"; $_ => { r => ( -r _ ), w => ( -w _ ), x => ( -x _ ), s => ( -s _ ), } } @files; my $hashref = \%stat; print Dumper($hashref); $

      This other version shows the same material but with blocks used and the (abridged) output from Dumper:

      $ ./stat2.pl ... ./stat3.pl is executable ./stat3.pl has a size of 293 bytes, uses 8 "blocks" of 512 bytes, the +filesystem uses a block size of 4096 bytes ./template_stuff is a directory ./template_stuff has a size of 4096 bytes, uses 8 "blocks" of 512 byte +s, the filesystem uses a block size of 4096 bytes $VAR1 = { ... './stat1.pl' => bless( [ 2049, 404418, 33204, 1, 1000, 1000, 0, 393, 1429336542, 1429336472, 1429336472, 4096, 8 ], 'File::stat' ), ... './template_stuff' => bless( [ 2049, 533854, 16893, 5, 1000, 1000, 0, 4096, 1429385812, 1429348668, 1429348668, 4096, 8 ], 'File::stat' ),

      This shows that even the small files take up 8 blocks in 2 different ways. I've been scratching my head to figure out all these fields, and they are to be the eqivalent of stat(2):

      struct stat { dev_t st_dev; /* ID of device containing file */ ino_t st_ino; /* inode number */ mode_t st_mode; /* protection */ nlink_t st_nlink; /* number of hard links */ uid_t st_uid; /* user ID of owner */ gid_t st_gid; /* group ID of owner */ dev_t st_rdev; /* device ID (if special file) */ off_t st_size; /* total size, in bytes */ blksize_t st_blksize; /* blocksize for file system I/O */ blkcnt_t st_blocks; /* number of 512B blocks allocated */ time_t st_atime; /* time of last access */ time_t st_mtime; /* time of last modification */ time_t st_ctime; /* time of last status change */ };

      The script:

      #!/usr/bin/perl -w use strict; use warnings; use v5.12; use File::stat 1.02 qw( stat lstat ); use Data::Dumper; my @files = glob('./*'); say "files are @files"; my %stat = map { $_ => lstat($_) } @files; # print out all sizes, as an example for my $fn (@files) { say $fn, ' is ', ( -d $stat{$fn} ? 'a directory' : -x $stat{$fn} ? 'executable' : 'not executable' ); say $fn, ' has a size of ', $stat{$fn}->size(), ' bytes, uses ', $stat{$fn}->blocks(), ' "blocks" of 512 bytes, the filesystem uses a block size of ', $stat{$fn}->blksize(), ' bytes'; } my $hashref = \%stat; print Dumper($hashref);

      Q1) Why does Data::Dumper bless this? I understand just enough about "bless" to be completely-miffed by it, much like in its religious context.

      As to which is "better," that would clearly depend on the user's needs. Maybe the user doesn't want certain information in a large hash. For me, it was a worthwhile exercise both ways.

        This shows that even the small files take up 8 blocks

        On a linux ext2/3/4 filesystem, the actual block size is 4096 bytes. But in struct stat (and so in the return value of perl's (l)stat), it reports the block count for an imaginary (legacy) block size of 512 bytes. A block is counted as used even if only one byte is actually used, so linux must report at least 8 "stat blocks" of 512 bytes each for an actual block of 4096 bytes. On a very small ext2/3/4 filesystem, you may see a block size of 1024 bytes, linux will then report only 2 "stat blocks" for a file of one byte.

        Do not worry about block sizes and allocated blocks, unless you write filesystem-specific tools, you do not need this information.

        Why does Data::Dumper bless this? I understand just enough about "bless" to be completely-miffed by it, much like in its religious context.

        bless creates Perl objects (data structures with associated code). File::stat uses objects to allow tests based on the saved results from the stat and lstat build-in functions. Additionally, implementing File::stat as returning objects allows it to overload some standard operations, notably the -X functions.

        Data::Dumper does not bless anything. It dumps data in a format that can be evaluated by perl. bless in the output of Data::Dumper tells you that Data::Dumper has encountered an object. As File::stat makes no attempts to hide its inner workings, Data::Dumper reports File::stat objects as blessed array references.

        Other objects may appear as blessed hash references (very common), but they may also be blessed scalar references (common with XS code, e.g. DBI), blessed glob references (file handles, mostly with IO::Handle and derived classes), or other blessed references.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Q1) Why does Data::Dumper bless this?

        The File::stat::lstat() function returns a  File::stat object: a bless-ed array reference. (The lstat built-in does not do this.) Data::Dumper must produce a string that, if compiled or eval-ed, would be able to reproduce this object. Hence: an array reference bless-ed into 'File::stat'.


        Give a man a fish:  <%-(-(-(-<