daggett has asked for the wisdom of the Perl Monks concerning the following question:

I need to search through a large structure with saved web pages and subsidiary files, saved from firefox, most often just to find the most recently updated page from its time stamp. I prefer to use the perl stat() function, rather than the ls command as I have been using:

james@tibrogargan:~/devel/find/testdirs$ find . -type f -exec ls -l -- +time-style=+%Y%m%d {} \; -rw-rw-r-- 1 james james 7 20221013 ./d2/d2_3/f2_3_1 -rw-rw-r-- 1 james james 13 20221013 ./d2/d2_2/f2_2_2 -rw-rw-r-- 1 james james 8 20221013 ./d2/d2_2/f2_2_1 -rw-rw-r-- 1 james james 13 20221013 ./d2/f2_2 -rw-rw-r-- 1 james james 0 20221012 ./d1/d3_3/f1 -rw-rw-r-- 1 james james 0 20221012 ./d1/d1_1/f1 -rw-rw-r-- 1 james james 0 20221012 ./d1/d1_1/f3 -rw-rw-r-- 1 james james 0 20221012 ./d1/d1_1/f2 -rw-rw-r-- 1 james james 0 20221012 ./d1/d1_2/f1 -rw-rw-r-- 1 james james 0 20221012 ./d1/d1_2/f3 -rw-rw-r-- 1 james james 0 20221012 ./d1/d1_2/f2

I want to transform the above in many ways, including placing the file basename at the start, etc., etc., roughly as follows:

f2_3_1 7 20221013 ./d2/d2_3/ <directoryInodeNumber> ...

The program, I have written as my first step towards accomplishing this (less many comments and debugging statements) is:

#!/usr/bin/perl use Cwd qw(cwd); my $wDirectory = cwd; my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtim +e, $ctime, $blksize, $blocks) = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +0); open(README, "find $wDirectory -type f |") or die "Can't run program: +$!\n"; while(<README>) { $output = $_; ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mti +me, $ctime, $blksize, $blocks) = stat("$output"); print("stats "); print ("dev: $dev, ino: $ino, mode: $mode, nlink: $nlink, uid: $ui +d, gid: $gid, rdev: $rdev, size: $size, atime: $atime, mtime: $mtime, + ctime: $ctime, blksize: $blksize, blocks: $blocks"); print("]\n"); } close(README);

But when I execute the program I get no data whatsoever from the stat() function:

james@tibrogargan:~/devel/find/testdirs$ ~/devel/perl/findLsiCleaned.p +l stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ] stats dev: , ino: , mode: , nlink: , uid: , gid: , rdev: , size: , ati +me: , mtime: , ctime: , blksize: , blocks: ]

Can anyone see why my script does not work?

Thank you for your attention.

Replies are listed 'Best First'.
Re: stat function used with linux find gives me no data
by Fletch (Bishop) on Oct 13, 2022 at 03:27 UTC

    You don't chomp newlines from your filenames you read back from find so my guess is that since you call stat with "/home/blah/foo\n" (note the trailing newline) it's failing and you don't notice since you don't check the return fron stat. Also you might look at File::Find or File::Find::Rule (or Path::Tiny might be of related interest) rather than shelling out to find.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

      Thank you. At first I tried:

      $output = chomp($_);

      That didn't work. So I found, from the online documentation, how to properly use chomp(). I tried:

      $output = $_; chomp($output);

      ... and it worked!

      Thank you, also, for those other suggestions. I will also give File::Find, File::Find::Rule and Path::Tiny a try soon.

        Well done for consulting the docs - they really are rather good.

        You might also want to know that a simple assignment can be chomped as well so you could write this as chomp ($output = $_); instead. Full demo script:

        #!/usr/bin/env perl use strict; use warnings; $_ = "foo\n"; print "in: '$_'\n"; chomp (my $output = $_); print "out: '$output'\n";

        I concur with others in warning against shelling out to find for your original task, not least because of the vagaries of the arguments and output formats of such a utility across platforms. Stick with the modules which come with Perl or are on CPAN for better portability.


        🦛

        from the online documentation

        In case you didn't know, you can also find out about any perl function with perldoc -f chomp from the command line, or perldoc MODULENAME for any installed perl module. On Unix, you can often see them as manual pages as well: man MODULENAME, but that depends on your distro.

Re: stat function used with linux find gives me no data
by NERDVANA (Priest) on Oct 13, 2022 at 03:27 UTC
    There's a newline "\n" on the end of the string you read, and you need to "chomp" it before passing to stat.

    Meanwhile, most of us working on this sort of problem would use something more convenient like File::Find, Path::Tiny->visit, Path::Class::Dir->recurse, or at the very least, File::stat.

Re: stat function used with linux find gives me no data
by haukex (Archbishop) on Oct 13, 2022 at 09:28 UTC

    You don't need to shell out to find, as Perl has the core module File::Find - here's a starting point using core modules only. File::stat, User::pwent, and User::grent give nice objects with accessors instead of the long lists of return values, and Time::Piece has nice extra date/time functionality.

    #!/usr/bin/env perl use warnings; use strict; use File::Find 'find'; use Fcntl qw/:mode/; use Cwd 'cwd'; use File::stat; use Time::Piece; use User::pwent; use User::grent; my $dir = shift || cwd; find({ wanted => sub { # filename (basename) is in $_, # current directory is in $File::Find::dir, # full name is in $File::Find::name my $st = lstat($_) or do { warn "failed to stat $File::Find::name: $!"; return }; return unless -f $st; # only regular files print $File::Find::name, " dev=", $st->dev, ", ino=", $st->ino, ", mode=", $st->mode, sprintf(" (perms=%#03o)", S_IMODE($st->mode)), ", nlink=", $st->nlink, ", uid=", $st->uid, " (", getpwuid($st->uid)->name, ")", ", gid=", $st->gid, " (", getgrgid($st->gid)->name, ")", ", rdev=", $st->rdev, ", size=", $st->size, ", atime=", $st->atime, ", mtime=", $st->mtime, " (", gmtime($st->mtime)->datetime, "Z)", ", ctime=", $st->ctime, ", blksz=", $st->blksize, ", blocks=", $st->blocks, "\n"; } }, $dir);
Re: stat function used with linux find gives me no data
by kcott (Archbishop) on Oct 13, 2022 at 13:14 UTC

    G'day daggett,

    I noticed your location -- hope you're not inundated with current floods.

    You should start all of your Perl code with the strict and warnings pragmata. You've removed debugging statements, so I've no idea what you tried. When printing variables, add markers so that you can see characters, such as tabs and newlines, that may be difficult to notice. Compare these two:

    $ perl -E 'my $var = "123\n"; say $var;' 123 $ perl -E 'my $var = "123\n"; say "|$var|";' |123 |

    There are a number of functions (e.g. stat & localtime) which return many values; you rarely want all of them. Capture everything then just pull out what you need. I suspect you may be getting a "can't see the wood for the trees" situation with the code you posted.

    Given your description, here's how I might have tackled this.

    #!/usr/bin/env perl use strict; use warnings; use constant { INODE => 1, FSIZE => 7, MTIME => 9, }; use Cwd 'cwd'; use File::Find; use Time::Piece; my $wDirectory = cwd(); find(\&wanted, $wDirectory); transform($wDirectory); display(); { my ($data_for, $ignore_re, $display_fmt); BEGIN { $ignore_re = qr{(?mx: \. (?: pl | swp \z ) )}; $display_fmt = "%-6s %3d %6d %-10s %17d [%d]\n"; } sub wanted { return unless -f $File::Find::name; return if $_ =~ $ignore_re; $data_for->{$File::Find::name} = { dir => $File::Find::dir, file => $_, stat => [ stat _ ], }; return; } sub transform { my ($cwd) = @_; for my $path (keys %$data_for) { my $data = $data_for->{$path}; my $t = localtime($data->{stat}[MTIME]); my $date = $t->ymd(); $date =~ y/-//d; my $rel_dir = join substr($data->{dir}, length $cwd), qw{. + /}; @$data{qw{date rel_dir inode size mtime}} = ($date, $rel_dir, @{$data->{stat}}[INODE, FSIZE, MTI +ME]); } return; } sub display { my @display_fields = qw{file size date rel_dir inode mtime}; printf $display_fmt, @{$data_for->{$_}}{@display_fields} for sort { $data_for->{$a}{mtime} <=> $data_for->{$b}{mtime} } keys %$data_for; return; } }

    Output:

    f1 0 20221013 ./d1/d3_3/ 10977524094816598 [1665639046] f1 0 20221013 ./d1/d1_1/ 10133099164684663 [1665639074] f2 0 20221013 ./d1/d1_1/ 9570149211263361 [1665639080] f3 0 20221013 ./d1/d1_1/ 9851624187974024 [1665639088] f1 0 20221013 ./d1/d1_2/ 9007199257842058 [1665639107] f2 0 20221013 ./d1/d1_2/ 9851624187974028 [1665639115] f3 0 20221013 ./d1/d1_2/ 10696049118105998 [1665639120] f2_2 0 20221013 ./d2/ 10414574141395314 [1665639154] f2_3_1 0 20221013 ./d2/d2_3/ 10414574141424462 [1665639207] f2_2_2 0 20221013 ./d2/d2_2/ 9851624187973976 [1665639223] f2_2_1 0 20221013 ./d2/d2_2/ 8725724281131669 [1665639238]

    Notes:

    • My code did not use a shell command; however, if I did, I would have used a lexical filehandle and the 3-argument form of open, and let the autodie pragma handle I/O exceptions:
      use autodie; ... open my $pipe, '-|', $command; while (<$pipe>) { ... }
      See "Opening a filehandle into a command" and autodie.
    • I quickly knocked up some test data. All files have zero size. I added a "[mtime]" field so you can see the chronological ordering; you may not want this but, for demo purposes, YYYYMMDD is not useful.
    • $display_fmt is appropriate for my output; you'll probably want something different.
    • &wanted -- this filters the files wanted and collects just the base data; adapt to your requirements. See File::Find.
    • &transform -- this just handles the tranformations. You were pretty vague about what you wanted; modify as needed. See Time::Piece.
    • &display -- this is only concerned with the output; change to suit your needs. See printf and sprintf.
    • Overall, note the separation of functions into three discrete subroutines. You can change the filtering criteria without it affecting the other two routines; you can use a different date format in &transform without needing to change &display; and so on. Consider using this technique in all of your code.

    — Ken

Re: stat function used with linux find gives me no data
by Anonymous Monk on Oct 13, 2022 at 17:15 UTC

    In general, if you want to know why a Perl built-in is not doing what you want, you should check whether it succeeded. In the case of stat(), this would look something like:

    my ( ... ) = stat( $output )
        or die "Failed to stat $output: $!";
    

    In this case the output would have been something like

    Failed to stat some-file
    : File not found at ... line ...
    
Re: stat function used with linux find gives me no data
by Anonymous Monk on Oct 13, 2022 at 03:55 UTC
    More Perl-ish to use https://perldoc.perl.org/functions/readdir ?
      More Perl-ish to use https://perldoc.perl.org/functions/readdir ?

      I don't see any need for reinventing File::Find here on such a low level.