in reply to •Re^4: Sort Large Files
in thread Sort Large Files

Did you actually try those?

Yes I did. Did you?

$ echo "hello" > data1 $ echo "world" > data2 $ file="data1 data2" $ cat $file | wc -l 2 $ < $file | wc -l bash: $file: ambiguous redirect 0

I suspect not, because you'd need an "eval" in your script to get the shell to do another round of whitespace parsing after the variable is interpolated, and that's a Very Good Thing.

Really? I've been writing constructs of the form:

FILES="file1 file2 file3 file4 file5" for file in $FILES do ... something with $file ... done
for a couple of decades. And now you're telling me it never worked???? Now, you're free to believe me, but may I quote from the beginning of perl's Configure?
paths='/bin /usr/bin /usr/local/bin /usr/ucb /usr/local /usr/lbin' paths="$paths /opt/bin /opt/local/bin /opt/local /opt/lbin" paths="$paths /usr/5bin /etc /usr/gnu/bin /usr/new /usr/new/bin /usr/n +bin" paths="$paths /opt/gnu/bin /opt/new /opt/new/bin /opt/nbin" paths="$paths /sys5.3/bin /sys5.3/usr/bin /bsd4.3/bin /bsd4.3/usr/ucb" paths="$paths /bsd4.3/usr/bin /usr/bsd /bsd43/bin /usr/ccs/bin" paths="$paths /etc /usr/lib /usr/ucblib /lib /usr/ccs/lib" paths="$paths /sbin /usr/sbin /usr/libexec" paths="$paths /system/gnu_library/bin" for p in $paths do case "$p_$PATH$p_" in *$p_$p$p_*) ;; *) test -d $p && PATH=$PATH$p_$p ;; esac done
No extra eval happening here.

I hand out the Useless Use of Cat Award precisely because of code like yours, where a cat is indeed completely useless.

Well, my code works and your suggested alternative does not work. So I get two things: an award, and working code. Good for me.

Replies are listed 'Best First'.
•Re^6: Sort Large Files
by merlyn (Sage) on Jan 06, 2005 at 14:48 UTC
    If this makes two files named "x" and "y", and not one file named "x space y":
    f="x y" touch $f
    then your shell is not /bin/sh compatible. Whitespace parsing happens before variable parsing in every bourne-ish shell I've used since the late 70s.

    As for "my" syntax:

    < $file | wc -l
    you erroneously put an extra pipe in there. Remove it, try again, and give yourself minus 1 point for bad copying.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.


    update: Yeah, I apparently got the first part wrong. My memory of shell programming has really creeped away over the years. However, I just noticed that zsh does it differently than bash, which is probably why I now misremember. zsh does work the way I stated.

    However, the second part does work on real shells, just not on bash or csh.

      Whitespace parsing happens before variable parsing in every bourne-ish shell I've used since the late 70s.

      I guess the intersection of the sets of shells we have used is empty then. Anyway, here's the relevant portion of IEEE Std 1003.1. From section 2.6:

      The order of word expansion shall be as follows:
      1. Tilde expansion (see Tilde Expansion), parameter expansion (see Parameter Expansion), command substitution (see Command Substitution), and arithmetic expansion (see Arithmetic Expansion) shall be performed, beginning to end. See item 5 in Token Recognition.
      2. Field splitting (see Field Splitting) shall be performed on the portions of the fields generated by step 1, unless IFS is null.
      3. Pathname expansion (see Pathname Expansion) shall be performed, unless set -f is in effect.
      4. Quote removal (see Quote Removal) shall always be performed last.
      As you see, parameter expansion happens before word splitting.

      Here's the relevant section from the bash manual:

      The  order  of expansions is: brace expansion, tilde expansion, parame-
      ter, variable and arithmetic expansion and command  substitution  (done
      in a left-to-right fashion), word splitting, and pathname expansion.
      
      Of course you say "New fangled things! GNU, POSIX, who needs them! V7, that's what real men use." So be it. From the Unix V7 manual:
      Blank interpretation
      After parameter and command substitution, any result of substitution are scanned for internal field separator characters (those found in $IFS) and split into distinct arguments where such characters are found. Explicit null arguments ("" or '') are retained. Implicite null arguments (those resulting from parameters that have no values) are removed.
      Now, I don't want to claim you are wrong, but if you have never programmed in the Unix V7 shell, GNU bash, or a POSIX compliant shell, which shells have you used since the 70s?

      As for "my" syntax:

      < $file | wc -l
      you erroneously put an extra pipe in there. Remove it, try again, and give yourself minus 1 point for bad copying.

      You're right. Think it will help, removing that pipe? Let's find out!

      $ echo "hello" > data1 $ echo "world" > data2 $ file="data1 data2" $ < $file wc -l bash: $file: ambiguous redirect
      Nope. Guess my "useless cat" is still very very useful.

      Forgeting to write "$x" instead of $x is a classic shell programming mistake which results in things breaking for strings that contain whitespace. And it has been a classic mistake since the '70s.

      If I were hiring for a job that required shell programming, that'd be one of the questions I'd ask.

      - tye        

      However, the second part does work on real shells, just not on bash or csh.
      I presume you mean with "real shells", your current favourite shell, "zsh". You are only partially right. You are right that the syntax works, but not the semantics. In
      file="data1 data2" <$file wc -l
      zsh does not give you the number of lines in the files "data1" and "data2". Instead, it gives you the number of lines of the file (singular) "data1 data2". The use of cat isn't going to save the day though,
      file="data1 data2" cat $file | wc -l
      also gives a count of the number of lines in the file "data1 data2".

      No doubt zsh has a way of getting the count of lines from both files, after all, zsh is supposed to have every feature under the sun and then some, but it's not <$file.

        What shell were you using that didn't give you a count of the lines in both data1 and data2?

        I tested the last snippet under Linux-x86 (Slackware 10), and NetBSD-sparc 1.6.2, and using bash and ksh93. In both cases I got a count of the lines in both files data1 and data2.

        Those of you seeing a count of the file "data1 data2" instead need to document what shells and systems you are seeing this on.