•Re^4: Sort Large Files

Replies are listed 'Best First'.
Re^5: Sort Large Files by Anonymous Monk on Jan 06, 2005 at 14:40 UTC
Did you actually try those? Yes I did. Did you? `$ echo "hello" > data1 $ echo "world" > data2 $ file="data1 data2" $ cat $file \| wc -l 2 $ < $file \| wc -l bash: $file: ambiguous redirect 0` [download] I suspect not, because you'd need an "eval" in your script to get the shell to do another round of whitespace parsing after the variable is interpolated, and that's a Very Good Thing. Really? I've been writing constructs of the form: `FILES="file1 file2 file3 file4 file5" for file in $FILES do ... something with $file ... done` [download] for a couple of decades. And now you're telling me it never worked???? Now, you're free to believe me, but may I quote from the beginning of perl's Configure? paths='/bin /usr/bin /usr/local/bin /usr/ucb /usr/local /usr/lbin' paths="$paths /opt/bin /opt/local/bin /opt/local /opt/lbin" paths="$paths /usr/5bin /etc /usr/gnu/bin /usr/new /usr/new/bin /usr/n +bin" paths="$paths /opt/gnu/bin /opt/new /opt/new/bin /opt/nbin" paths="$paths /sys5.3/bin /sys5.3/usr/bin /bsd4.3/bin /bsd4.3/usr/ucb" paths="$paths /bsd4.3/usr/bin /usr/bsd /bsd43/bin /usr/ccs/bin" paths="$paths /etc /usr/lib /usr/ucblib /lib /usr/ccs/lib" paths="$paths /sbin /usr/sbin /usr/libexec" paths="$paths /system/gnu_library/bin" for p in $paths do case "$p_$PATH$p_" in $p_$p$p_) ;; ) test -d $p && PATH=$PATH$p_$p ;; esac done [download] No extra eval happening here. I hand out the Useless Use of Cat Award precisely because of code like yours, where a cat is indeed completely useless.* Well, my code works and your suggested alternative does not work. So I get two things: an award, and working code. Good for me.	[reply] [d/l] [select]
•Re^6: Sort Large Files by merlyn (Sage) on Jan 06, 2005 at 14:48 UTC
If this makes two files named "x" and "y", and not one file named "x space y": `f="x y" touch $f` [download] then your shell is not /bin/sh compatible. Whitespace parsing happens before variable parsing in every bourne-ish shell I've used since the late 70s. As for "my" syntax: `< $file \| wc -l` [download] you erroneously put an extra pipe in there. Remove it, try again, and give yourself minus 1 point for bad copying. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply. update: Yeah, I apparently got the first part wrong. My memory of shell programming has really creeped away over the years. However, I just noticed that zsh does it differently than bash, which is probably why I now misremember. zsh does work the way I stated. However, the second part does work on real shells, just not on bash or csh.	[reply] [d/l] [select]
Re^7: Sort Large Files by Anonymous Monk on Jan 06, 2005 at 15:38 UTC
Whitespace parsing happens before variable parsing in every bourne-ish shell I've used since the late 70s. I guess the intersection of the sets of shells we have used is empty then. Anyway, here's the relevant portion of IEEE Std 1003.1. From section 2.6: The order of word expansion shall be as follows: Tilde expansion (see Tilde Expansion), parameter expansion (see Parameter Expansion), command substitution (see Command Substitution), and arithmetic expansion (see Arithmetic Expansion) shall be performed, beginning to end. See item 5 in Token Recognition. Field splitting (see Field Splitting) shall be performed on the portions of the fields generated by step 1, unless IFS is null. Pathname expansion (see Pathname Expansion) shall be performed, unless set -f is in effect. Quote removal (see Quote Removal) shall always be performed last. As you see, parameter expansion happens before word splitting. Here's the relevant section from the `bash` manual: The order of expansions is: brace expansion, tilde expansion, parame- ter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion. Of course you say "New fangled things! GNU, POSIX, who needs them! V7, that's what real men use." So be it. From the Unix V7 manual: Blank interpretation After parameter and command substitution, any result of substitution are scanned for internal field separator characters (those found in $IFS) and split into distinct arguments where such characters are found. Explicit null arguments (`""` or `''`) are retained. Implicite null arguments (those resulting from parameters that have no values) are removed. Now, I don't want to claim you are wrong, but if you have never programmed in the Unix V7 shell, GNU bash, or a POSIX compliant shell, which shells have you used since the 70s? As for "my" syntax: `< $file \| wc -l` [download] you erroneously put an extra pipe in there. Remove it, try again, and give yourself minus 1 point for bad copying. You're right. Think it will help, removing that pipe? Let's find out! `$ echo "hello" > data1 $ echo "world" > data2 $ file="data1 data2" $ < $file wc -l bash: $file: ambiguous redirect` [download] Nope. Guess my "useless cat" is still very very useful.	[reply] [d/l] [select]
Re^7: Sort Large Files (wrong) by tye (Sage) on Jan 06, 2005 at 16:22 UTC
Forgeting to write `"$x"` instead of `$x` is a classic shell programming mistake which results in things breaking for strings that contain whitespace. And it has been a classic mistake since the '70s. If I were hiring for a job that required shell programming, that'd be one of the questions I'd ask. - tye	[reply]
Re^7: Sort Large Files by Anonymous Monk on Jan 07, 2005 at 09:32 UTC
However, the second part does work on real shells, just not on bash or csh. I presume you mean with "real shells", your current favourite shell, "zsh". You are only partially right. You are right that the syntax works, but not the semantics. In `file="data1 data2" <$file wc -l` [download] `zsh` does not give you the number of lines in the files `"data1"` and `"data2"`. Instead, it gives you the number of lines of the file (singular) `"data1 data2"`. The use of `cat` isn't going to save the day though, `file="data1 data2" cat $file \| wc -l` [download] also gives a count of the number of lines in the file `"data1 data2"`. No doubt `zsh` has a way of getting the count of lines from both files, after all, `zsh` is supposed to have every feature under the sun and then some, but it's not `<$file`.	[reply] [d/l] [select]
Re^8: Sort Large Files by csh (Novice) on Jan 14, 2005 at 21:49 UTC
Re^5: Sort Large Files by jdporter (Paladin) on Mar 21, 2005 at 04:19 UTC
Did you actually try those? I thought the point was that some calls to `cat` can't be eliminated: `cat data1 data2 \| wc -l` # How would you do this with redirects? `cat -s sq.my.bl \| wc -l` # cat itself can do some processing.	[reply] [d/l] [select]