Hi all:
I am writing a new tool like rhash, only with the ability to update hashes. I just got tired of waiting for this bug to be fixed:
Update hash if file last modification date has changed
https://github.com/rhash/RHash/issues/107
I have looked around and, surprisingly, there is no hash/checksum tool that does that properly (that I could find).
I think I will be using File::Find to scan files and directories. The first tool version will be in Perl, but it may need to be rewritten later in C for performance reasons (or whatever).
Therefore, I want the "allfiles.checksums" file to list files and their checksums ordered in such a way that you can easily and consistently reimplement the filename sorting in any other language.
I have been reading question "Sorting utf-8" here:
https://www.perlmonks.org/?node_id=252806
And I also looked at Unicode::Collate and other Perl Unicode documentation.
It is all pretty complicated. I have come to the conclusion that the only safe way to implement this is to do a plain UTF-8 lexicographic string sort on the filenames. I know that humans will find the sort order not good, but I think I can consider the "allfiles.checksums" file an internal database. The script itself could offer options to list its contents with different locale collation orders, if anybody really cares.
How do I implement a pure UTF-8 lexicographic string sort in Perl?
I guess I need to make sure first that the filenames returned by File::Find are actually coded in UTF-8, because Perl may choose some other internal string encoding. I hope that this is what utf8::upgrade is for.
And then I can use binary comparison operators '<' or 'cmp' on those UTF-8 strings. Is that correct?
Thanks in advance,
rdiez
In reply to UTF-8 lexicographic string sort by rdiez
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |