readdir() on a sysopen() handle?

perlhuhn has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: readdir() on a sysopen() handle? by afoken (Chancellor) on Aug 20, 2017 at 13:05 UTC
Looking through several linux man pages, it looks like you normally should use opendir, readdir or scandir, and closedir from C. Those functions are specified by POSIX and are portable. But the glibc also offers fdopendir that converts a plain integer file descriptor to a `DIR `. So in C, something like this should work: `/ UNTESTED! / DIR d opendir_nofollow(const char * pathname) { int fd = open(pathname, O_DIRECTORY \| O_NOFOLLOW); if (fd == -1) { return NULL; } return fdopendir(fd); }` [download] Converting that to a perl directory handle will very likely require a little bit of XS code. Perhaps Inline::C might be helpful. You definitively want to have a look at the perl sources, the part that implements the opendir function, to see how to correctly create a directory handle. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^2: readdir() on a sysopen() handle? by haukex (Archbishop) on Aug 20, 2017 at 14:25 UTC
As you wrote this I was actually fiddling with exactly that :-) The relevant Perl source appears to be pp_open_dir in pp_sys.c, which uses the `IoDIRP` macro, which apparently accesses the `DIR * xiou_dirp` slot of `struct xpvio`, but I can't seem to find any more documentation on it. Disclaimer: I am not an XS expert, I can't guarantee that the following is entirely correct! I got some of this from the Inline::C::Cookbook, a bit of research in perlapi, and a bit of fiddling... `myreaddir` does all of the work of opening and reading the directory in C, returning a Perl list, while `_xs_myfdopendir` with the Perl wrapper `myfdopendir` attempts to be a custom opendir. use warnings; use strict; use Inline C => <<'END_OF_C'; void myreaddir(SV* sv_dirn) { Inline_Stack_Vars; Inline_Stack_Reset; int fd = open( SvPVx(sv_dirn, PL_na), O_RDONLY\|O_DIRECTORY\|O_NOFOLLOW); if (fd<0) Inline_Stack_Return(0); DIR* dir = fdopendir(fd); if (dir==NULL) Inline_Stack_Return(0); struct dirent dp; while ( (dp=readdir(dir)) != NULL ) Inline_Stack_Push(sv_2mortal( newSVpvf("%s", dp->d_name) )); if( closedir(dir)!=0 ) Inline_Stack_Return(0); Inline_Stack_Done; } int _xs_myfdopendir(SV sv_dirn, SV* sv_hnd) { int fd = open( SvPVx(sv_dirn, PL_na), O_RDONLY\|O_DIRECTORY\|O_NOFOLLOW); if (fd<0) return 0; DIR* dir = fdopendir(fd); if (dir==NULL) return 0; IoDIRP(sv_2io(sv_hnd)) = dir; return 1; } END_OF_C use Symbol qw/geniosym/; use File::Spec; sub myfdopendir { return unless _xs_myfdopendir( $_[0]//File::Spec->curdir, my $dh=geniosym ); return $dh; } use Data::Dump; my @x = myreaddir('/tmp') or die $!; dd @x; my $dh = myfdopendir('/tmp') or die $!; dd readdir $dh; closedir $dh or die $!; [download] Update: A couple of Perl modules that use XS to read directories, in particular the first one's `readdir_hashref` looks like it could be modified fairly simply: ReadDir, IO-Dirent, PerlIO-Util	[reply] [d/l] [select]
Re^3: readdir() on a sysopen() handle? by ikegami (Patriarch) on Aug 21, 2017 at 04:20 UTC
`or die $!` is wrong in `my @x = myreaddir('/tmp') or die $!;` because an empty list doesn't necessarily denote an error.	[reply] [d/l] [select]
Re^4: readdir() on a sysopen() handle? by haukex (Archbishop) on Aug 21, 2017 at 08:43 UTC
Re^5: readdir() on a sysopen() handle? by ikegami (Patriarch) on Aug 21, 2017 at 19:04 UTC
Some notes below your chosen depth have not been shown here
Re: readdir() on a sysopen() handle? by haukex (Archbishop) on Aug 20, 2017 at 11:16 UTC
I am not an expert on the underlying C API, but I looked into this a bit out of curiosity... but I haven't yet been able to find any examples of whether it is even possible to readdir(3) a directory opened with open(2) instead of opendir(3)? On *NIX systems, the Perl API mirrors the C API closely, and if it's not possible with C, Perl isn't going to be able to do this either - at least not natively, perhaps there are some modules that use XS and can access other APIs provided by the OS, like the openat(2) and related functions. One reference I found was an older version of the DJGPP manual, which explicitly says (edited for brevity): "You can open directories using `open`, but there is limited support for POSIX file operations on directories. The principal reason for allowing `open` to open directories is to support changing directories using `fchdir`. If you wish to read the contents of a directory, use the `opendir` and `readdir` functions instead." This seems to be exactly what your "workaround" is doing. There's also the file chdir-safer.c from gnulib which appears to use the `fchdir` technique in the function `chdir_no_follow`.	[reply] [d/l] [select]
Re^2: readdir() on a sysopen() handle? by perlhuhn (Novice) on Aug 20, 2017 at 12:03 UTC
Thanks for the reference to the opendir manpage. It mentions fdopendir(3) which would do what I need but it doesn't seem to be supported by Perl.	[reply]
Re: readdir() on a sysopen() handle? by Laurent_R (Canon) on Aug 20, 2017 at 09:52 UTC
May be you can use `opendir` and then filter out the symbolic links when reading the directory with `readdir` .	[reply] [d/l] [select]
Re^2: readdir() on a sysopen() handle? by perlhuhn (Novice) on Aug 20, 2017 at 10:49 UTC
Such a filter would have to use stat() to determine if an entry is a diretory before opening it. The problem is that an entry might change from a directory to a symbolic link between the stat() and the open(). O_NOFOLLOW prevents such race conditions.	[reply]
Re^3: readdir() on a sysopen() handle? by shmem (Chancellor) on Aug 20, 2017 at 14:37 UTC
The problem is that an entry might change from a directory to a symbolic link between the stat() and the open(). Wouldn't a second stat() after the open tell? Well duh, the underlying file could just switch back from symlink to directory between the open() and the second stat, e.g. something that emulates a directory via a maliciously loaded file system module doing sinister things. Just curious - what problem are you trying to solve? Correct me if I am wrong, but after getting a handle to something, even if the something is renamed, deleted, and symlinked back, it holds to the original structure being accessed: my $path = '/tmp/open'; -d $path and die "remove $path first\n"; mkdir $path; for (qw(foo bar quux)) { open my $fh, '>',"$path/$_"; } mkdir "$path/baz"; for (qw(blorf blorfldyick)) { open my $fh,'>', "$path/baz/$_"; } opendir my $dh1, $path; while(readdir $dh1) { next if /^\.\.?$/; print "read(dh1): $path/$_\n"; if (-d "$path/$_") { opendir my $dh2, "$path/$_" or die; # emulate external change directory to symlink rename "$path/$_","$path/fie"; symlink "$path/fie", "$path/$_" or die; # end emulate if(-l "$path/$_") { print "bogus change to $path/$_:\n"; print " $path/$_ points to ",readlink "$path/$_","\n"; } while (my $e = readdir $dh2) { next if $e =~ /^\.\.?$/; print "read(dh2): $e\n"; } } } __END__ read(dh1): /tmp/open/foo read(dh1): /tmp/open/quux read(dh1): /tmp/open/baz bogus change to /tmp/open/baz: /tmp/open/baz points to /tmp/open/fie read(dh2): blorf read(dh2): blorfldyick read(dh1): /tmp/open/bar [download] Side note which might resolve this XY Problem (if so): -d on a symlink returns true up to v5.25.10, so -d resolves symlinks, which it shouldn't do. IMHO this is a bug. Apropos race condition: I can't think of anything which would resolve that, other than a system call like `openif()` into which the expected type is passed as an argument. perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'	[reply] [d/l] [select]
Re^4: readdir() on a sysopen() handle? by perlhuhn (Novice) on Aug 21, 2017 at 15:28 UTC
Re^3: readdir() on a sysopen() handle? by Laurent_R (Canon) on Aug 20, 2017 at 17:32 UTC
The problem is that an entry might change from a directory to a symbolic link between the stat() and the open(). O_NOFOLLOW prevents such race conditions. I fail to see why or how `O_NOFOLLOW` would prevent this from happening if you were thinking about doing a `sysopen` followed by a `readdir` or anything more or less equivalent with a different system call. Maybe you should explain more precisely what you're really trying to do.	[reply] [d/l] [select]
Re^4: readdir() on a sysopen() handle? by Anonymous Monk on Aug 20, 2017 at 18:10 UTC
Re: readdir() on a sysopen() handle? by Marshall (Canon) on Aug 21, 2017 at 01:12 UTC
I am confused as to what application behavior you are trying to prevent and exactly what your application is? A file system is like a continually evolving biological organism. There can be incestuous liaisons between family groups (symlinks). The directory structure correlates "textual names" to "structures of bits" which are called "files". When you do something like a readdir(), you get an imperfect snapshot of the "family tree" of textual names. It is completely possible to get a filename from a readdir() which can't be opened because it doesn't exist anymore once you actually try to open that textual name because some other process has deleted that name in the meantime. Depending upon the O/S and the type of file, it is possible to read a directory (which produces textual names), open a file (which resolves to a binary filehandle (independent of the text name)), and continue to use that file while the textual name is deleted from the directory. That situation means that one or many programs continue to use the "file" although no new program can open it because its "textual name" no longer exists. If you get to a file and actually open that file via a symlink, that file is open for use, even if the symlink is deleted (textual representation is deleted). I like the first post by Laurent_R. If you don't want to follow a directory symlink, don't open it if it is one. I guess you can check if that directory name is still not a symlink once you open it, but all sorts of strange thinks can still happen. It would be helpful if you explained a bit more about what your applications does and how it handles failed directory or file "opens".	[reply]
Re^2: readdir() on a sysopen() handle? by perlhuhn (Novice) on Aug 21, 2017 at 15:33 UTC
The purpose of the program is to write data files into a specific subdirectory of the users' home directories, e.g. /home/username/datadir/datafile.timestamp.txt. datadir is only writable by the program and readable by the user. But since it's inside the user's home directory the user could rename it an replace it with a symlink or re-create it and put a symlink with the datafile name inside. Of course, the obvious solution is to change the filesystem layout but that is currently not an option. So the program needs to open the directory and the data file with O_NOFOLLOW to avoid writing to the wrong places. The desired behavior when encountering a symlink is to refuse writing the data and produce a warning message. This case is rare enough that it's not too much hassle. The readdir() part is just a minor issue and it might get removed in the future but it feels a bit clumsy right now. And since fdopendir() is part of POSIX.1-2008 one might hope to find it in a current Perl version. Anyway, thanks for all your replies. I guess I'll put up with the chdir() solution.	[reply]