in reply to Re: Sort Large Files
in thread Sort Large Files

# This is *NOT* a useless use of cat. cat $file |\
In spite of the tag, it is in fact a useless use of cat. So, I wonder why the original poster went out of their way to try to say it wasn't? {sigh}

Also, those backslashes at the end of lines give me the willies. The shells that I use don't need them.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Replies are listed 'Best First'.
Re^3: Sort Large Files
by Anonymous Monk on Jan 06, 2005 at 14:02 UTC
    It of course isn't.

    You cannot always replace

    cat $file | prog
    with
    prog < $file
    A few cases on which it fails:
    file="" file="data1 data2" file="-s squeeze-my-blanks"

    In this particular pipeline, $file could have been placed after the perl command (if you can assume $file doesn't have a switch for cat). However, that would place the data to act on somewhere in the middle of the pipeline. Which I find harder to understand. Flow should go from right to left, left to right, top to bottom, or bottom to top. But not middle, left, right. cat is short, just three letters, which places the data nearly at the beginning. Placing the entire pipeline in parens, and putting < $file at the end places the data at the end, but you can't do that because of the reasons listed earlier.

    That's two reasons why the use of cat wasn't useless.

    So, I wonder why the original poster went out of their way to try to say it wasn't?

    Because this is Perlmonks, and this is where dr. Pavlov would have a field day if he was still alive. The original poster had hoped that by saying the use of cat wasn't useless people would stop and think before reacting reflexly - but I guess the cerebral cortex was once again victorious over the brains.

    Also, those backslashes at the end of lines give me the willies. The shells that I use don't need them.

    Good for you. My preferred shells don't use them either, but I wasn't going to spend time figuring out which shells need them and which ones don't (as I don't know which shells the readers are usgin) so I just used a syntax that should work regardless whether the shell needs them or not. A bit of portability at the cost of three keystrokes, not bad, is it?

      file="" file="data1 data2" file="-s squeeze-my-blanks"
      Did you actually try those? I suspect not, because you'd need an "eval" in your script to get the shell to do another round of whitespace parsing after the variable is interpolated, and that's a Very Good Thing. As it is, you'd be trying to cat the current directory (garbage in garbage out), a file named "data1 data2" and probably get some switch violation because it'd be a single weird switch with a lot of odd chars in it.
      In this particular pipeline, $file could have been placed after the perl command (if you can assume $file doesn't have a switch for cat). However, that would place the data to act on somewhere in the middle of the pipeline. Which I find harder to understand. Flow should go from right to left, left to right, top to bottom, or bottom to top. But not middle, left, right. cat is short, just three letters, which places the data nearly at the beginning.
      Nobody is saying violate the order. Write it like this if you want it left to right:
      < $file \ perl ... | other ... | nextthing ... | and_so_on ...
      I hand out the Useless Use of Cat Award precisely because of code like yours, where a cat is indeed completely useless.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        Did you actually try those?

        Yes I did. Did you?

        $ echo "hello" > data1 $ echo "world" > data2 $ file="data1 data2" $ cat $file | wc -l 2 $ < $file | wc -l bash: $file: ambiguous redirect 0

        I suspect not, because you'd need an "eval" in your script to get the shell to do another round of whitespace parsing after the variable is interpolated, and that's a Very Good Thing.

        Really? I've been writing constructs of the form:

        FILES="file1 file2 file3 file4 file5" for file in $FILES do ... something with $file ... done
        for a couple of decades. And now you're telling me it never worked???? Now, you're free to believe me, but may I quote from the beginning of perl's Configure?
        paths='/bin /usr/bin /usr/local/bin /usr/ucb /usr/local /usr/lbin' paths="$paths /opt/bin /opt/local/bin /opt/local /opt/lbin" paths="$paths /usr/5bin /etc /usr/gnu/bin /usr/new /usr/new/bin /usr/n +bin" paths="$paths /opt/gnu/bin /opt/new /opt/new/bin /opt/nbin" paths="$paths /sys5.3/bin /sys5.3/usr/bin /bsd4.3/bin /bsd4.3/usr/ucb" paths="$paths /bsd4.3/usr/bin /usr/bsd /bsd43/bin /usr/ccs/bin" paths="$paths /etc /usr/lib /usr/ucblib /lib /usr/ccs/lib" paths="$paths /sbin /usr/sbin /usr/libexec" paths="$paths /system/gnu_library/bin" for p in $paths do case "$p_$PATH$p_" in *$p_$p$p_*) ;; *) test -d $p && PATH=$PATH$p_$p ;; esac done
        No extra eval happening here.

        I hand out the Useless Use of Cat Award precisely because of code like yours, where a cat is indeed completely useless.

        Well, my code works and your suggested alternative does not work. So I get two things: an award, and working code. Good for me.

        Did you actually try those?
        I thought the point was that some calls to cat can't be eliminated:
        cat data1 data2 | wc -l # How would you do this with redirects?
        cat -s sq.my.bl | wc -l # cat itself can do some processing.