•Re^6: Sort Large Files

If this makes two files named "x" and "y", and not one file named "x space y":

f="x y"
touch $f
[download]

then your shell is not /bin/sh compatible. Whitespace parsing happens before variable parsing in every bourne-ish shell I've used since the late 70s.

As for "my" syntax:

< $file | wc -l
[download]

you erroneously put an extra pipe in there. Remove it, try again, and give yourself minus 1 point for bad copying.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

update: Yeah, I apparently got the first part wrong. My memory of shell programming has really creeped away over the years. However, I just noticed that zsh does it differently than bash, which is probably why I now misremember. zsh does work the way I stated.

However, the second part does work on real shells, just not on bash or csh.

Comment on •Re^6: Sort Large Files Select or Download Code

Replies are listed 'Best First'.
Re^7: Sort Large Files by Anonymous Monk on Jan 06, 2005 at 15:38 UTC
Whitespace parsing happens before variable parsing in every bourne-ish shell I've used since the late 70s. I guess the intersection of the sets of shells we have used is empty then. Anyway, here's the relevant portion of IEEE Std 1003.1. From section 2.6: The order of word expansion shall be as follows: Tilde expansion (see Tilde Expansion), parameter expansion (see Parameter Expansion), command substitution (see Command Substitution), and arithmetic expansion (see Arithmetic Expansion) shall be performed, beginning to end. See item 5 in Token Recognition. Field splitting (see Field Splitting) shall be performed on the portions of the fields generated by step 1, unless IFS is null. Pathname expansion (see Pathname Expansion) shall be performed, unless set -f is in effect. Quote removal (see Quote Removal) shall always be performed last. As you see, parameter expansion happens before word splitting. Here's the relevant section from the `bash` manual: The order of expansions is: brace expansion, tilde expansion, parame- ter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion. Of course you say "New fangled things! GNU, POSIX, who needs them! V7, that's what real men use." So be it. From the Unix V7 manual: Blank interpretation After parameter and command substitution, any result of substitution are scanned for internal field separator characters (those found in $IFS) and split into distinct arguments where such characters are found. Explicit null arguments (`""` or `''`) are retained. Implicite null arguments (those resulting from parameters that have no values) are removed. Now, I don't want to claim you are wrong, but if you have never programmed in the Unix V7 shell, GNU bash, or a POSIX compliant shell, which shells have you used since the 70s? As for "my" syntax: `< $file \| wc -l` [download] you erroneously put an extra pipe in there. Remove it, try again, and give yourself minus 1 point for bad copying. You're right. Think it will help, removing that pipe? Let's find out! `$ echo "hello" > data1 $ echo "world" > data2 $ file="data1 data2" $ < $file wc -l bash: $file: ambiguous redirect` [download] Nope. Guess my "useless cat" is still very very useful.	[reply] [d/l] [select]
Re^7: Sort Large Files (wrong) by tye (Sage) on Jan 06, 2005 at 16:22 UTC
Forgeting to write `"$x"` instead of `$x` is a classic shell programming mistake which results in things breaking for strings that contain whitespace. And it has been a classic mistake since the '70s. If I were hiring for a job that required shell programming, that'd be one of the questions I'd ask. - tye	[reply]
Re^7: Sort Large Files by Anonymous Monk on Jan 07, 2005 at 09:32 UTC
However, the second part does work on real shells, just not on bash or csh. I presume you mean with "real shells", your current favourite shell, "zsh". You are only partially right. You are right that the syntax works, but not the semantics. In `file="data1 data2" <$file wc -l` [download] `zsh` does not give you the number of lines in the files `"data1"` and `"data2"`. Instead, it gives you the number of lines of the file (singular) `"data1 data2"`. The use of `cat` isn't going to save the day though, `file="data1 data2" cat $file \| wc -l` [download] also gives a count of the number of lines in the file `"data1 data2"`. No doubt `zsh` has a way of getting the count of lines from both files, after all, `zsh` is supposed to have every feature under the sun and then some, but it's not `<$file`.	[reply] [d/l] [select]
Re^8: Sort Large Files by csh (Novice) on Jan 14, 2005 at 21:49 UTC
What shell were you using that didn't give you a count of the lines in both data1 and data2? I tested the last snippet under Linux-x86 (Slackware 10), and NetBSD-sparc 1.6.2, and using bash and ksh93. In both cases I got a count of the lines in both files data1 and data2. Those of you seeing a count of the file "data1 data2" instead need to document what shells and systems you are seeing this on.	[reply]