Re: One Liner to strip crlf
by roboticus (Chancellor) on Sep 04, 2014 at 15:55 UTC
|
dirtdog:
Hmmm ... it feels wrong to me, and here's why:
- If you can't figure out a one-liner for that, it may be that it'll be too hard to remember and/or alter. One liners are great for trivial transformations, but if they're complex enough to be worth asking about, it's probably better to write a script for.
- If you're using a one-liner as a filter, then you still need to process the entire file. If you're trying to avoid the unneeded search and replace to save time, then you'd probably want to save all the I/O time as well, as that's likely going to be significant as well.
- If speed is *that* important, you should probably write a quick C program specially built for the job.
Just for grins, I built a reasonably large file (465MB) and tried several filters on it:
$ # Do nothing but count lines
$ time perl -i -pe '++$cnt; END {print STDERR $cnt}' floop.cr
10000000
real 0m2.641s
user 0m2.218s
sys 0m0.375s
$ # Your original filter
$ time perl -i -pe 's/\r//g; ++$cnt; END {print STDERR $cnt}' floop.cr
10000000
real 0m6.298s
user 0m5.703s
sys 0m0.421s
$ # Don't do it globally, end at the first one
$ time perl -i -pe 's/\r//; ++$cnt; END {print STDERR $cnt}' floop.cr
10000000
real 0m3.439s
user 0m2.937s
sys 0m0.390s
$ # Do it only at the end of the line
$ time perl -i -pe 's/\r$//; ++$cnt; END {print STDERR $cnt}' floop.cr
10000000
real 0m3.188s
user 0m2.781s
sys 0m0.359s
So you can gain a bit of performance by tweaking your regular expression a bit. After I did so, the search and replace overhead was roughly 20% of the entire runtime. So you can't really get a big win here. Or, if 20% is enough time to be significant, then I'd suggest changing your processing so that rather than using a filter, you instead write a small perl script that would simply check the first line of the file. If it has "\r" then filter it, otherwise process using the original file. That way could could save nearly all of the I/O time when you don't have a "\r" in the file.
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] |
|
|
If you can't figure out a one-liner for that, it may be that it'll be too hard to remember and/or alter. One liners are great for trivial transformations, but if they're complex enough to be worth asking about, it's probably better to write a script for.
++
My exceptions to this rule are:
- Golf fun
- If the slightly complex one-liner is captured in a file, like Makefiles, unix aliases, etc.
That being said, if I do want a one-liner and I'm struggling with it, I usually resort to creating a throw-away script first, then converting it to a one-liner.
| [reply] |
|
|
perl -i -pe 's/\r// if /\r$/' <file>
The time savings is probably negligable as you demonstrated.
thanks for the help | [reply] [d/l] |
Re: One Liner to strip crlf
by kennethk (Abbot) on Sep 04, 2014 at 16:27 UTC
|
perl -ne 's/\r$// or last; $t.=$_; if (eof) {close ARGV; open $fh, ">"
+, $ARGV or die; print $fh $t}' files
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] |
Re: One Liner to strip crlf
by Tux (Canon) on Sep 04, 2014 at 15:52 UTC
|
Is the requirement that the \r shall be in front of a \n or should all \r'r be stripped even if the first line (my interpretation of "header") contains a \r anywhere, like in
123,"abc\rdef",4,\n
Beside that, we probably like the *reason* fro the strip needage to be able to come up with better answers.
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
|
|
Hi, To clarify. I'm trying to create a one-liner that looks at the first line only to see if it has a CRLF, and if it does, then go ahead and execute the search and replace on all lines. The search and replace command works. The piece i can't figure out is how to check if a condition exists on the 1st line only and if it does...then execute the command on the entire file in a one-liner.
| [reply] |
|
|
perl -pi -0e'm/\A[^\n]*\r\n/ and s/\r\n/\n/g' file
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] |
Re: One Liner to strip crlf
by Anonymous Monk on Sep 04, 2014 at 20:09 UTC
|
Monks,
Doesn't this paragraph from perlop apply, or am I missing something?
All systems use the virtual "\n" to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical newline character. It is only an illusion that the operating system, device drivers, C libraries, and Perl all conspire to preserve. Not all systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on the ancient Macs (pre-MacOS X) of yesteryear, these used to be reversed, and on systems without line terminator, printing "\n" might emit no actual data. In general, use "\n" when you mean a "newline" for your system, but use the literal ASCII when you need an exact character. For example, most networking protocols expect and prefer a CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they often accept just "\012", they seldom tolerate just "\015". If you get in the habit of using "\n" for networking, you may be burned some day.
Also don't the files need to be opened with binmode / :raw to make sure that any possible :crlf is off?
| [reply] [d/l] [select] |
Re: One Liner to strip crlf
by CountZero (Bishop) on Sep 04, 2014 at 15:32 UTC
|
How do your lines end? With a single "CR" or with "CRLF"?
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
Re: One Liner to strip crlf
by NetWallah (Canon) on Sep 04, 2014 at 15:42 UTC
|
Try this (untested , but you should get the idea):
perl -i -pe 'last unless s/\r//g' <filename>
Not exactly your spec, but it's moral equivalent.
"You're only given one little spark of madness. You mustn't lose it." - Robin Williams
| [reply] [d/l] |
|
|
| [reply] |