Re: Re: Re: Unexpected file test (-f) result on Windows
by BrowserUk (Patriarch) on Sep 19, 2003 at 15:20 UTC
|
Update: FWIW, this "undocumented feature" is deep within the OS, using '<' in the argument to FindFirstFile/FindNextFile apis also treats it the same way as '*'. It appears that this is the only undocumented character that exhibits this behaviour. Weirdness indeed.
What can I say? Sorry, but I completely misread the sense of your constanation. Your right! It appears that CMD is treating '<' in the same way as '*'
P:\test>dir "<.bak"
Volume in drive P is Winnt
Volume Serial Number is D822-5AE5
Directory of P:\test
27/08/03 00:47 608 286744.pl8.bak
28/08/03 20:20 801 287272.pl8.bak
31/08/03 12:42 729 287900-2.pl8.bak
31/08/03 09:58 1,056 287900.pl8.bak
05/09/03 15:37 2,015 289016-2.pl8.bak
05/09/03 09:56 1,092 289106.pl8.bak
05/09/03 17:26 1,165 289250.pl8.bak
08/09/03 10:49 742 b-sort.pl8.bak
...
I'm really surprised that I have never encountered this behaviour before...but then, I would never have thought to look for it:)
It is weird, as far as I am aware, completely undocumented, and very annoying. I cannot even begin to fathom how and why this would have been done!
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] [d/l] |
|
|
open FILE, ">testfile.log" or die "Unable to create file: $!";
close FILE;
if (! -f "*estfile.log") {
print STDERR "Not a file *.\n";
}
if (! -f "|testfile.log") {
print STDERR "Not a file |.\n";
}
if (! -f "?estfile.log") {
print STDERR "Not a file ?.\n";
}
if (! -f "<testfile.log") {
print STDERR "Not a file <.\n";
}
if (! -f ">>testfile.log") {
print STDERR "Not a file >>.\n";
}
exit;
__END__
Not a file *.
Not a file |.
Not a file ?.
Not a file >>.
What I don't understand is why this is effectively performing a directory search in order to test one filename! Does Windows not have the equivalent of the Unix stat() system call? The -f operator should just be calling the perl stat function. Does anyone with a better understanding of Win32 Perl than I understand what is happening here?
(Should I fall back to matching the filename with a regex?)
Cheers, -- Dave :-)
$q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print
| [reply] [d/l] |
|
|
Tracking this stuff through the perl sources is a nightmare.
- The -f is mapped to a function Perl_pp_ftis in opcode.h (for FileTestIs (a file) perhaps?).
- Which is mapped to lib\core\embed.h:#define pp_ftis Perl_pp_ftis
- Which is implemented in terms of my_stat() in pp_sys.c
PP(pp_ftis)
{
I32 result = my_stat();
dSP;
if (result < 0)
RETPUSHUNDEF;
RETPUSHYES;
}
- Which is mapped to embed.h:3209:#define my_stat() Perl_my_stat(aTHX)
- Perl_my_stat() is implemented in doio.c in terms of PerlLIO_fstat()
- Which is mapped to iperlsys.h:734:#define PerlLIO_fstat(fd, buf) Fstat((fd), (buf))
- Which gets mapped to lib\core\dosish.h:133:#define Fstat(fd,bufptr) fstat((fd),(bufptr))
- Which is a c-runtime api.
However, fstat() takes a file descriptor (fd), which implies an open file handle...Should have noticed that earlier. Track back to where the calls are not defined in terms of a file descriptor and were back at doio.c. Sure enough, follow the other path in Perl_my_stat()and we find it calls PerlLIO_stat()
- Which maps to lib\core\iperlsys.h:739:#define PerlLIO_stat(name, buf) Stat((name), (buf))
- Which is mapped to dosish.h:142:# define Stat(fname,bufptr) stat((fname),(bufptr))
- Then you have to move over to the C-runtime headers, which I don't have for MSVC++ (as used by Active State), so I can't bore you with those details, but suffice it to say, stat() ends up getting mapped to _stat64() as AS build with large file support which means they need the version of stat that can handle filesizes >32-bit.
Dissassembling the code in MSVCRT.dll:_stat64 we find this
The salient point here is the call to EXT:KERNEL32.DLL!FindFirstFileA.
You asked: ...why this is effectively performing a directory search in order to test one filename!..... And the answer goes something like this.
When you pass a filespec to stat() or one of its varients, in order to ask the os for information about the file, you need to get an OS 'handle' (an INODE) in unix terms) to that file. You can get one of these by various means, but if you aren't interested in opening the file, then the OS gives you a call to obtain that handle. In Win32 this is FindFirstFile(), I think the equivalent under unix is opendir or maybe it gets translated directly into (ioctl?) calls to the underlying filesystem.
Anyway, the point is that under normal circumstances, calling FFF with a non-wildcard filespec returns a structure containing (almost) all the information required to satisfy the stat() call.
Whilst you may not thinkof stat() as doing a "directory search", it has to have the filesystem search to find the information to fulfil the stat() call, which on unix is in the INODE and so a search is being done, it's just that the API name doesn't reflect this.
The fact that your filespec contains some adornments used for determining the filemode is a Perl thing and not the OS's problem.
That these conflict with an undocumented feature within the OS is ...erm.. unfortunate! I guess that LW and MS came to the same conclusion that '<' & '>' are good candidates for using as meta characters as most sane people are unlikely to embed these in their filenames as they would conflict with their use as redirection metacharacters by CLIs.
In the final analysis, you would have to strip Perls two-arg open metacharacters from the filespec before you passed them to stat(), whichever OS you are on!
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] [d/l] [select] |
|
|
|
|
I see the same results using CMD.exe, but not using 4NT.exe shell. So it's not part of FindFirst/FindNext OS calls, but something in the shell.
Hmm, maybe 4NT does its own wildcard matching. But, I note that the driver level calls (through filemon) is only getting one call to DIRECTORY when I dir <.exe, even though it passed *.exe to the DIRECTORY call. Yet when doing a dir *.exe, it gets several calls to DIRECTORY before the last returns NO MORE FILES.
| [reply] |
|
|
That's interesting. If 4NT is avoiding this behaviour, then it means that they must be doing something to avoid it, which also means that they must have recognised the bug at some point.
The bug is definitely there in the OS, vis;
#! perl -slw
use strict;
use Win32::API::Prototype;
ApiLink( 'Kernel32', 'HANDLE FindFirstFile( LPCTSTR lpFileName, LPWIN3
+2_FIND_DATA lpFindFileData )' );
my $FIND_DATA;
my $fHandle = FindFirstFile( $_ . 'junk', $FIND_DATA = chr(0)x300 )
and print "'$_': '", substr( $FIND_DATA, 44 )
or warn $^E
for '?', '*', '<', '!'; # map chr, 0..255
__END__
P:\test>FFtest
'?': '
'*': 'junk
'<': 'junk
'!': '
# Uncomment the comment to see how I verified that only '*' and '<' exhibit the behaviour.
Maybe scrutiny of the 4NT website, or maybe even a question to their support people would be worthwhile in discovering the history of this.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] [d/l] |
|
|
Browser I dont have a wintel box handy right now, can you try this:
C:
cd c:\
dir "<c:\test\.bak"
my gut fealling tells me that what you are really doing is echoing a list of .bak extention filenames to dir. just like the shorthand for dir *.bak (dir .bak) does in many versions of dos.
-Waswas | [reply] [d/l] |
|
|
P:\>dir "<P:\test\.bak"
The filename, directory name, or volume label syntax is incorrect.
As this shows, '<' is acting exactly the same way as '*'
P:\test>dir "<.<"
Volume in drive P is Winnt
Volume Serial Number is D822-5AE5
Directory of P:\test
19/09/03 13:29 <DIR> .
19/09/03 13:29 <DIR> ..
20/04/03 19:48 1,193 237671.pl8
20/04/03 19:28 729 242776.pl8
20/04/03 19:33 624 243366.pl8
21/04/03 03:13 456 250383.pl8
20/04/03 19:56 1,574 250495.pl8
...
P:\test>dir "2<.bak"
Volume in drive P is Winnt
Volume Serial Number is D822-5AE5
Directory of P:\test
27/08/03 00:47 608 286744.pl8.bak
28/08/03 20:20 801 287272.pl8.bak
31/08/03 12:42 729 287900-2.pl8.bak
31/08/03 09:58 1,056 287900.pl8.bak
05/09/03 15:37 2,015 289016-2.pl8.bak
05/09/03 09:56 1,092 289106.pl8.bak
05/09/03 17:26 1,165 289250.pl8.bak
7 File(s) 7,466 bytes
978,853,888 bytes free
This behaviour is completely undocumented in both the latest XP CMD docs and the online API docs for FindFirstFile/FindNextFile, which are also affected.
Even a fairly extensive google has failed to turn up any trace of it as either a bug or a feature, which amazes me even more. Could be that DaveH has found a completely new bug that affects Win32 going all the way back to something like 1995 and has gone undiscovered? Seems unlikely but...
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.
| [reply] [d/l] [select] |
Re: Re: Re: Unexpected file test (-f) result on Windows
by waswas-fng (Curate) on Sep 19, 2003 at 16:27 UTC
|
Why not just untaint the filename you are getting if you consider << >> < > tainted input?
-Waswas | [reply] |
|
|
Ah, thinking around the problem: that's more like it!
What would you recommend I use to untaint the data in a portable way? I would expect to miss a lot of cases if I used a regular expression to do the job; that was why I tried to use a file test in the first place. My intention is not to be "too" restrictive in how my module is used (though the specification is not that clearly defined yet i.e. I'm making it up as I go ;-)).
Cheers, -- Dave :-)
$q=[split+qr,,,q,~swmi,.$,],+s.$.Em~w^,,.,s,.,$&&$$q[pos],eg,print
| [reply] |
|
|
my $filename= $param =~ m#^(\w[-.\w]*)\z#
or die "Invalid file name ($param).\n";
which allows for plenty of choice in naming the file but doesn't allow anything unsafe to be used. If this is a situation where you want to allow full paths and don't have any worries about the use of "..", then you can do:
my $filepath= $param =~ m#^((?:/?[.\w][-.\w]*)+)\z#
or die "Invalid file path ($param).\n";
Or be more portable by using File::Spec to split the $param into components and untaint each:
#!/usr/bin/perl -w
use strict;
use File::Spec::Functions qw( splitpath splitdir catdir catpath );
for my $path ( @ARGV ) {
eval {
warn "($path) => (", untaintPath( $path ), ")\n";
1;
} or
warn "$@\n";
}
exit( 0 );
sub untaintPath {
my( $param )= @_;
my( $vol, $dirs, $file )= splitpath( $param );
## my( $clean )= $file =~ m#^(\w[-.\w]*)\z#
my( $clean )= $file =~ m#^(\w[-.\w]*|)\z#
or die "Invalid file name ($file) in path ($param).\n";
$file= $clean;
my @dirs= splitdir( $dirs );
for my $dir ( @dirs ) {
##( $clean )= $dir =~ m#^([.\w][-.\w]*|)\z#
( $clean )= $dir =~ m#^(\w[-\w]*|)\z#
or die "Invalid dir name ($dir) in path ($param).\n";
$dir= $clean;
}
$dirs= catdir( @dirs );
if( "" eq $dirs && "" eq $file ) {
die "Empty dir/file in path ($param).\n";
}
return catpath( $vol, $dirs, $file );
}
*shrug*
- tye | [reply] [d/l] [select] |
|
|
since you are just worried about the open's syntax which is already standardized accross platforms write your regex and case for that...
-Waswas
| [reply] |