Re^3: Duplicates in Directories

Well, for a given operating system (and possibly file system), either they do come in alphabetical order, or they don't. If they do for the OP's system, then presumably the OP can rely on that feature (although there can be some issues with the upper or lower case of the file names). And, BTW, I've just checked on the three different systems available to me (*nix, VMS and Windows), glob returned the names of the files in the directory in alphabetical system for all three of them, so, yes, it is a rather common feature.

Then, of course, as you rightly said, if they don't come in alphabetical order, or if there is any doubt, it is just as easy to use the Perl sort facility, and it will only take a few split seconds with 10,000 files.

The idea of sorting data to get better performance (avoiding lookups) is sometimes very efficient. I'm doing it quite commonly in a slightly different context, to compare pairs of very large files that would not fit in a hash: sorting both files on the comparison key (using the *nix sort utility), and then reading both files in parallel in my Perl program to detect records missing from either file or differences in attributes of records having the same comparison key.

Comment on Re^3: Duplicates in Directories

Replies are listed 'Best First'.
Re^4: Duplicates in Directories by huck (Prior) on Oct 09, 2017 at 18:55 UTC
glob returned the names of the files in the directory in alphabetical system for all three of them, I may be a caveman but id use readdir to get the file list and i think that it returns files in the order they are stored in the directory	[reply]
Re^5: Duplicates in Directories by Laurent_R (Canon) on Oct 09, 2017 at 19:39 UTC
Hi huck, I'm not sure to understand what you mean, but take a look at this: $ perl -E '@c = glob "."; say for @c;' 1095341.pl 172.20.98.3.txt 8188eu-v7-20150914.tar.gz a.exe a.pl abdou.pl add2.pl add_2.pl age_switch.pl amazon.pl anagrams.pl any.pl any2.pl any3.pl aoh.pl [... many lines omitted for brevity ...] warnings.txt weight.pl while.pl wifi.pl words.txt xml.pl xml_mac.pl xxxA.txt xxxH.txt xxxL.txt $ [download]	[reply] [d/l]
Re^6: Duplicates in Directories by soonix (Chancellor) on Oct 09, 2017 at 20:01 UTC
glob relies on I/O Operators in perlop <*.c> which in turn uses File::Glob. There, under POSIX FLAGS / GLOB_NOSORT it is stated: By default, the pathnames are sorted in ascending ASCII order; …	[reply]
Re^7: Duplicates in Directories by Laurent_R (Canon) on Oct 09, 2017 at 21:24 UTC
Re^6: Duplicates in Directories by huck (Prior) on Oct 09, 2017 at 20:28 UTC
I'm not sure to understand what you mean Nowhere did the OP mention glob. Given the task of populating an array with all the filenames in a directory i would first pick readdir over glob. Im pretty sure readdir returns the files in the order in which they are stored in the directory.	[reply]
Re^7: Duplicates in Directories by Laurent_R (Canon) on Oct 09, 2017 at 21:21 UTC
Re^8: Duplicates in Directories by huck (Prior) on Oct 09, 2017 at 22:40 UTC