in reply to sortmd5 script

I'm not sure whether to do a code critique or just ask what problem this script is trying to solve, so I'll offer both.

Code critique:

Object critique: MD5 sums are a great way to get a 'fingerprint' for a bunch of files -- it's very handy to confirm whether two files on disparate systems are the same. It's not clear whether this utility is working in that direction or not; there's no clear explanation of what the inputs and outputs are.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re^2: sortmd5 script
by zentara (Cardinal) on Aug 13, 2008 at 15:57 UTC
    not clear whether this utility is working in that direction or not

    I was going to post a similar response, but figured the writer knows what his scripts are for. But since you brought it up, my point was going to be, he never checks the veracity of the existing md5sums as he passes thru them. He checks if there is an existing md5sum for a file, then moves on, happily thinking his data has not been tampered with. What if the md5sum for that file has changed, or the file has changed?

    I was giving him the benefit of the doubt thinking, "if he keeps daily backups of everything", he can always go back and find a day where the md5sum and data matched; but now I'm not so sure.

    At least he should keep an md5sum of the file containing the md5sums, to be certain his script (or other force) has not screwed up his file.

    The checking and double-checking never ends with "spy vs. spy", especially if your job is on the line if the data fails. Of course blaming it on cosmic rays usually fools the dumbos in management. :-)


    I'm not really a human, but I play one on earth Remember How Lucky You Are
Re^2: sortmd5 script
by blazar (Canon) on Aug 14, 2008 at 16:53 UTC
    You've used good variable names, but there are no comments.

    I personally believe that this can (i.e. please don't think I'm claiming it necessarily is) just as good as it gets: if comments add nothing and the code is pretty much self explanatory, then one should plainly not add them.

    you don't need either the exit or the __END__ at the end of your script.

    While the unnecessary exit is very unperlish, I'm sorry to have to repeat myself (because I don't feel like going and have a Super Search but I'm sure I'd find tons of previous remarks of the same kind...) but __END__ is pretty much like a "bye" and pairs nicely with the "hello" given by the shebang line: this is a remark not of my own but heard from $Larry himself, and I pretty much was fond of it as soon as I heard it for the very first time.

    --
    If you can't understand the incipit, then please check the IPB Campaign.
      While the unnecessary exit is very unperlish [...] but __END__ is pretty much like a "bye" and pairs nicely with the "hello" given by the shebang line

      In order to actually write a "bye", exit is just as good as __END__ (with the difference of having no option of (perl) code after __END__).

        Exactly for that __END__ is more perlish... you put it and any text after, the text won't even be seen by the parser; text after an exit;, OTOH, must still be syntax-clean perl :-)
        []s, HTH, Massa (κς,πμ,πλ)
Re^2: sortmd5 script
by mscharrer (Hermit) on Aug 28, 2008 at 12:57 UTC
    Thanks talexb for the feedback. I was on holidays for the last two weeks so I'm just answering now.

    I think I should explain the task of the script again:
    I'm using a lot of MD5 checksum files (md5sums), which hold MD5/file name pairs, for comparison and also for my CD/DVD data backups to verify if they got burned correctly. I also update md5sums files from time to time (see my script addmd5). For several different reasons I like to have these files (i.e. the content of the files) sorted alphabetically after the filename, not the MD5 sum. Therefore a simple sort doesn't work.

    The sortmd5 script takes a md5sums file, splits MD5 sums and file names and sorts them after the file names, then joins the lines again and outputs the result.

    The default is to work in-place (similar to the -i switch), i.e. write the result back to the original file, if no output file name is given. If the input file had unreadable lines a backup is created before it is overwritten.

    Usage example:
    The following input MD5 file

    443..29D test.avi 563..D93 xyz AB2..FFE a.file
    is rewritten to:
    AB2..FFE a.file 443..29D test.avi 563..D93 xyz

      I would just do a one-liner ..

      alex@foo:~/tmp$ cat >data.in 443..29D test.avi 563..D93 xyz AB2..FFE a.file alex@foo:~/tmp$ perl -e 'while(<>){chomp;($md5,$file)=split;$data{$fil +e}=$md5;}foreach(sort keys %data){print"$data{$_} $_\n";}' <data.in AB2..FFE a.file 443..29D test.avi 563..D93 xyz
      .. and re-direct to an output file as necessary.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds