my test data really does contain EVERY conceivable character
That's swell. So you need to cope with the residue
of processes that create files whose names consist
of randomly selected byte values? (Why stop at 128?) I
hope that your execution path doesn't include directories or
programs with such liberal names -- that could be a very
awkward system to work with...
Given such an environment, I'd be inclined to focus on a
means to rename files that have troublesome characters in their
names -- e.g. using readdir, locate each data file whose
name would match /[^-\w\$\@\%\#.,:+~=]/, invent a new name
for each such file using only sensible characters, perhaps
create a suitable table that documents the original "name"
and the newly-assigned name, and rename the file
before you do anything else with it.
Creating and assigning distinct, usable names is easy.
Don't even start to worry about how to run a shell command
on file names that contain control characters and whatnot.
update:
If renaming nasty files is not an option (due to permissions
or politics), you could use the "symlink()" function instead,
creating a nice name for accessing the file without altering
the original directory entry. The symlink wouldn't need to
be in the same place as the file (could even be on /tmp, and
last only as long as needed to run system("long | pipe | command")
|