Correct me if I am wrong, but if you have filenames of the form user_nnnnn.jpg, and you rename them to the form user_md5.jpg, the only "loss of data" that could occur is if a user has uploaded the same file under two different names?
Yes, that's exactly the point morgon is making when warning against the suggestion to hash on file content instead of file name in one of the ancestors of this post.
| [reply] |
Okay. Then no data is lost; except one of the original names. But they'd both change anyway.
So, whatever mechanism the user has to access those files--presumably (renamed) links--will now point to the same file. But that's okay because they were duplicates anyway.
So where is the problem?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
So where is the problem?
Let's see, you have a back-end storage. It stores imagines. The user has stored an image twice (a copy, perhaps). For whatever reason, filenames are remapped. (You aren't assuming the actual user is going to remember the random file names, are you?). The OP implements the suggestion to rehash based on content - making the backend merge the two files. The user, whatever front end he's using, still sees two files. He then decides to delete one of his copies. Or modify one (keeping the other copy as an original).
Oops.
Of course, maybe the OP's system doesn't work that way. We do not know. But I think it's really, really bad to give the OP an advice that may cause to data loss, and then, if it's pointed out, try to wiggle out of it by making assumptions on the OPs system.
| [reply] |