in reply to Sort Large Files

If it's a really large file, I'd let the shell sort it:
# This is *NOT* a useless use of cat. cat $file |\ perl -ne 'printf "%04d%02d%02d %s", (split '/')[2,0,1], $_' |\ sort |\ cut -d ' ' -f 2
This also uses a GRT ;-)

Replies are listed 'Best First'.
•Re^2: Sort Large Files
by merlyn (Sage) on Jan 06, 2005 at 13:13 UTC
    # This is *NOT* a useless use of cat. cat $file |\
    In spite of the tag, it is in fact a useless use of cat. So, I wonder why the original poster went out of their way to try to say it wasn't? {sigh}

    Also, those backslashes at the end of lines give me the willies. The shells that I use don't need them.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      It of course isn't.

      You cannot always replace

      cat $file | prog
      with
      prog < $file
      A few cases on which it fails:
      file="" file="data1 data2" file="-s squeeze-my-blanks"

      In this particular pipeline, $file could have been placed after the perl command (if you can assume $file doesn't have a switch for cat). However, that would place the data to act on somewhere in the middle of the pipeline. Which I find harder to understand. Flow should go from right to left, left to right, top to bottom, or bottom to top. But not middle, left, right. cat is short, just three letters, which places the data nearly at the beginning. Placing the entire pipeline in parens, and putting < $file at the end places the data at the end, but you can't do that because of the reasons listed earlier.

      That's two reasons why the use of cat wasn't useless.

      So, I wonder why the original poster went out of their way to try to say it wasn't?

      Because this is Perlmonks, and this is where dr. Pavlov would have a field day if he was still alive. The original poster had hoped that by saying the use of cat wasn't useless people would stop and think before reacting reflexly - but I guess the cerebral cortex was once again victorious over the brains.

      Also, those backslashes at the end of lines give me the willies. The shells that I use don't need them.

      Good for you. My preferred shells don't use them either, but I wasn't going to spend time figuring out which shells need them and which ones don't (as I don't know which shells the readers are usgin) so I just used a syntax that should work regardless whether the shell needs them or not. A bit of portability at the cost of three keystrokes, not bad, is it?

        file="" file="data1 data2" file="-s squeeze-my-blanks"
        Did you actually try those? I suspect not, because you'd need an "eval" in your script to get the shell to do another round of whitespace parsing after the variable is interpolated, and that's a Very Good Thing. As it is, you'd be trying to cat the current directory (garbage in garbage out), a file named "data1 data2" and probably get some switch violation because it'd be a single weird switch with a lot of odd chars in it.
        In this particular pipeline, $file could have been placed after the perl command (if you can assume $file doesn't have a switch for cat). However, that would place the data to act on somewhere in the middle of the pipeline. Which I find harder to understand. Flow should go from right to left, left to right, top to bottom, or bottom to top. But not middle, left, right. cat is short, just three letters, which places the data nearly at the beginning.
        Nobody is saying violate the order. Write it like this if you want it left to right:
        < $file \ perl ... | other ... | nextthing ... | and_so_on ...
        I hand out the Useless Use of Cat Award precisely because of code like yours, where a cat is indeed completely useless.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.