Re: Remove Duplicate Lines

Replies are listed 'Best First'.
Re^2: Remove Duplicate Lines by afoken (Chancellor) on Aug 01, 2019 at 19:47 UTC
Let's see: `use strict` missing `use warnings` missing Missing my for $ifile, $ofile, $header, $data. no check that the program is called with the correct number of arguments Forking a shell (1) via qx (``) begs for trouble - see Improve pipe open? ... to run sed, just to read the first line of a file ... while making sed read the entire file ... and ignoring all quoting issues by simply not quoting at all - see The problem of "the" default shell ... and ignoring the fact that sed is not available by default on Windows and other operating systems Forking another shell via qx to pipe sed output to sort -u input ... again without any qouting ... again assuming sed is available everywhere ... assuming a POSIX sort is available everywhere. DOS/Windows sort does not understand -u and can't sort and filter out dupes ... reading the entire output of sort -u into memory ... just to write it out again three lines later And finally, `exit 0` is redundant This is highly inefficient and has several issues with "interesting" filenames. In Re: Remove Duplicate Lines, BrowserUk explains how to use perl properly. Another option - if running on a POSIX compatible system - is to use sort properly. Without headers, it is trivial: `sort -u < inputfile > outputfile` [download] With headers, this will do: `head -n 1 inputfile > outputfile sed '1d' inputfile \| sort -u >> outputfile` [download] This way, head can stop processing the input file after the first line, unlike `sed -n '1p'`. Directly writing to the outputfile avoids all further overhead of your script. Alexander (1) yes, given a sane filename, perl may start the first sed without help of the default shell. Change the filename to something interesting and perl will start sed via the default shell. -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Remove Duplicate Lines
by afoken (Chancellor) on Aug 01, 2019 at 19:47 UTC

Let's see:

use strict missing
use warnings missing
Missing my for $ifile, $ofile, $header, $data.
no check that the program is called with the correct number of arguments
Forking a shell (1) via qx (``) begs for trouble - see Improve pipe open?
... to run sed, just to read the first line of a file
... while making sed read the entire file
... and ignoring all quoting issues by simply not quoting at all - see The problem of "the" default shell
... and ignoring the fact that sed is not available by default on Windows and other operating systems
Forking another shell via qx to pipe sed output to sort -u input
... again without any qouting
... again assuming sed is available everywhere
... assuming a POSIX sort is available everywhere. DOS/Windows sort does not understand -u and can't sort and filter out dupes
... reading the entire output of sort -u into memory
... just to write it out again three lines later
And finally, exit 0 is redundant

This is highly inefficient and has several issues with "interesting" filenames.

In Re: Remove Duplicate Lines, BrowserUk explains how to use perl properly.

Another option - if running on a POSIX compatible system - is to use sort properly. Without headers, it is trivial:

sort -u < inputfile > outputfile
[download]

With headers, this will do:

head -n 1 inputfile > outputfile
sed '1d' inputfile | sort -u >> outputfile
[download]

This way, head can stop processing the input file after the first line, unlike sed -n '1p'. Directly writing to the outputfile avoids all further overhead of your script.

Alexander

(1) yes, given a sane filename, perl may start the first sed without help of the default shell. Change the filename to something interesting and perl will start sed via the default shell.

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

[reply]
[d/l]
[select]