Re: What's the best way to avoid name collisions when creating files?
by jeffa (Bishop) on May 02, 2005 at 20:29 UTC
|
| [reply] |
|
|
I concur. :)
You want a truly unique idenfier? "Universally Unique Identifier" pretty much describes it.
| [reply] |
|
|
Although Data::UUID looks like an interesting solution, one big annoyance for me is it doesn't come standard in my Perl distribution (Fedora Core 1).
| [reply] |
|
|
|
|
|
|
|
|
Re: What's the best way to avoid name collisions when creating files?
by suaveant (Parson) on May 02, 2005 at 20:00 UTC
|
Well... there is always the old date,time and pid combination... that works fine as long as the script doesn't handle multiple messages in a loop. Of course, if it does handle messages in a loop it is easy to tell what the last name you used was and increment a counter if it is the same.
There is always file locking with something like flock...
And I believe you can also use sysopen to create files if they aren't there and error if they are, but not 100% sure on that... something with O_CREAT and O_EXCL maybe...
- Ant
- Some of my
best work - (1 2 3)
| [reply] |
|
|
C:\t>set DIRCMD=/b
C:\t>dir newfile
File Not Found
C:\t>perl -MFcntl -e "sysopen F, 'newfile', O_EXCL | O_CREAT or die"
C:\t>dir newfile
newfile
C:\t>perl -MFcntl -e "sysopen F, 'newfile', O_EXCL | O_CREAT or die"
Died at -e line 1.
| [reply] [d/l] |
|
|
There's a cavat to
flock though. It doesn't work across the network . If you know that you're always going to use the local file system, then great. However if you move to a NAS, flock may stop working.
I ran into this with DBD::CSV. DBD::CSV will use flock under the hood to ensure that it has exclusive access to the file it is reading/writing. However if the file is on a NAS (Network Area Storage) and accessed with NFS, then DBD::CSV will fail to open the file.
| [reply] [d/l] [select] |
|
|
my $filename = "~/tmp/temp.$$";
while (-e $filename) {
$filename .= '.' . chr(int(rand(25)) + 65);
}
though this, of course, presumes that you don't have to worry too much about race conditions.
| [reply] [d/l] |
Re: What's the best way to avoid name collisions when creating files?
by scmason (Monk) on May 02, 2005 at 21:19 UTC
|
My first instinct would be to grab the md5 sum of the message (including headers). You should be pretty safe there. As mentioned above, no matter what method you should always try and detect filename collision and perhaps alter based on that. Most programs tend to change filename to filename-2 in the case of a collision. | [reply] |
|
|
| [reply] |
|
|
One of MD5's nice properties is that it's rediculously fast.
One benchmark I've seen was something like 90 megabytes per second on a quite modest machine. MD4 is a little faster at around 100 megabytes per second, but most of the more cryptographically secure digests are slower, many MUCH slower.
So much so that the recommendations are to keep using MD5 for non-sensitive stuff, despite recommendations to start moving away for signatures.
| [reply] |
Re: What's the best way to avoid name collisions when creating files?
by Fletch (Bishop) on May 02, 2005 at 20:55 UTC
|
| [reply] |
Re: What's the best way to avoid name collisions when creating files?
by Cody Pendant (Prior) on May 03, 2005 at 04:58 UTC
|
I just use random strings, with this sub I found by searching PerlMonks:
sub rndlc{local$"=''; "@{[map{chr(97+int rand 26)} 1 .. shift]}" };
my $filename = rndlc(10);
You can always attach the random string at the end of a date-time string to get more human-friendly filenames.
Of course there's no guarantee you won't get the same string twice, but the odds are 2610 against it...
($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print
| [reply] [d/l] |
|
|
You are assuming that the random number generator's results are evenly distributed. You might like to check the quality of rand's results using Statistics::ChiSquare.
| [reply] [d/l] |
Re: What's the best way to avoid name collisions when creating files?
by inman (Curate) on May 03, 2005 at 09:23 UTC
|
How about creating one directory per day with sub-directories (File::Path) that reflect the structure of the information that you are trying to archive (mailbox name etc.). You should be able to add the e-mails to this structure with much less chance of collision (using File::Temp to make sure). When your daily archiving task finishes, the directory can be zipped (Archive::Zip) for long term storage.
Although, this isn't a substantial difference to the previous suggestions, the extra structure will benefit you in the long term. At some point, someone will want to retrieve an e-mail from the archive. Storing by date allows you to restrict the scope of the text searching that you do later. | [reply] |
Re: What's the best way to avoid name collisions when creating files?
by philiph (Acolyte) on May 04, 2005 at 12:41 UTC
|
Thanks for all the excellent suggestions. Right now I'm using filenames of the format <unix time>.<pid>.<counter>. I'm opening the files in a loop with O_CREAT and O_EXCL and if the initial open fails I increment the counter and keep trying until it works. I think that will be more than adequate for my needs. | [reply] |
|
|
And it is nice to have useful info in the filename, I use date time pid for my maildir stuff and it lets me easily identify message dates and times... actually been useful a couple of times for things like spam stats.
- Ant
- Some of my
best work - (1 2 3)
| [reply] |
Re: What's the best way to avoid name collisions when creating files?
by wizkid (Initiate) on May 02, 2005 at 22:39 UTC
|
File::Temp is not an option for it tries to delete the file as soon as you dont need it anymore.
I truly believe you will get along fine with a combination of time and pid. | [reply] |
|
|
$tmp = new File::Temp( UNLINK => 0, SUFFIX => '.dat' );
not to mention several other options, like
$unopened_file = mktemp( $template );
| [reply] [d/l] [select] |
|
|
| [reply] |