dirtdog has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have the perl one liner to strip the carriage return, but i only want to execute the search and replace on every line if the header contains it. This way i don't unnecessarily go thru every line doing a search and replace. If the header has it, I can assume the rest of the file will have it...then i'll run it on every line in the file.

This is what i have, but it's only doing it on the header record. Does anyone know how to use a one liner to accomplish my intended goal?

perl -i -pe 's/\r//g if $. ==1 && /\r/' <filename>

Thanks a lot.

Replies are listed 'Best First'.
Re: One Liner to strip crlf
by roboticus (Chancellor) on Sep 04, 2014 at 15:55 UTC

    dirtdog:

    Hmmm ... it feels wrong to me, and here's why:

    • If you can't figure out a one-liner for that, it may be that it'll be too hard to remember and/or alter. One liners are great for trivial transformations, but if they're complex enough to be worth asking about, it's probably better to write a script for.
    • If you're using a one-liner as a filter, then you still need to process the entire file. If you're trying to avoid the unneeded search and replace to save time, then you'd probably want to save all the I/O time as well, as that's likely going to be significant as well.
    • If speed is *that* important, you should probably write a quick C program specially built for the job.

    Just for grins, I built a reasonably large file (465MB) and tried several filters on it:

    $ # Do nothing but count lines $ time perl -i -pe '++$cnt; END {print STDERR $cnt}' floop.cr 10000000 real 0m2.641s user 0m2.218s sys 0m0.375s $ # Your original filter $ time perl -i -pe 's/\r//g; ++$cnt; END {print STDERR $cnt}' floop.cr 10000000 real 0m6.298s user 0m5.703s sys 0m0.421s $ # Don't do it globally, end at the first one $ time perl -i -pe 's/\r//; ++$cnt; END {print STDERR $cnt}' floop.cr 10000000 real 0m3.439s user 0m2.937s sys 0m0.390s $ # Do it only at the end of the line $ time perl -i -pe 's/\r$//; ++$cnt; END {print STDERR $cnt}' floop.cr 10000000 real 0m3.188s user 0m2.781s sys 0m0.359s

    So you can gain a bit of performance by tweaking your regular expression a bit. After I did so, the search and replace overhead was roughly 20% of the entire runtime. So you can't really get a big win here. Or, if 20% is enough time to be significant, then I'd suggest changing your processing so that rather than using a filter, you instead write a small perl script that would simply check the first line of the file. If it has "\r" then filter it, otherwise process using the original file. That way could could save nearly all of the I/O time when you don't have a "\r" in the file.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      If you can't figure out a one-liner for that, it may be that it'll be too hard to remember and/or alter. One liners are great for trivial transformations, but if they're complex enough to be worth asking about, it's probably better to write a script for.
      ++

      My exceptions to this rule are:

      • Golf fun
      • If the slightly complex one-liner is captured in a file, like Makefiles, unix aliases, etc.

      That being said, if I do want a one-liner and I'm struggling with it, I usually resort to creating a throw-away script first, then converting it to a one-liner.

      Good point Roboticus...I'll just run the one liner on the entire file as follows:

      perl -i -pe 's/\r// if /\r$/' <file>

      The time savings is probably negligable as you demonstrated.

      thanks for the help

Re: One Liner to strip crlf
by kennethk (Abbot) on Sep 04, 2014 at 16:27 UTC
    So, as GotToBTru pointed out basic loop control won't work, and as roboticus points out, most benefit would be from avoiding unnecessary I/O. This means you can't use the -i flag, and would need to handle the file replace yourself. Something like (untested):
    perl -ne 's/\r$// or last; $t.=$_; if (eof) {close ARGV; open $fh, ">" +, $ARGV or die; print $fh $t}' files

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: One Liner to strip crlf
by Tux (Canon) on Sep 04, 2014 at 15:52 UTC

    Is the requirement that the \r shall be in front of a \n or should all \r'r be stripped even if the first line (my interpretation of "header") contains a \r anywhere, like in

    123,"abc\rdef",4,\n

    Beside that, we probably like the *reason* fro the strip needage to be able to come up with better answers.


    Enjoy, Have FUN! H.Merijn

      Hi, To clarify. I'm trying to create a one-liner that looks at the first line only to see if it has a CRLF, and if it does, then go ahead and execute the search and replace on all lines. The search and replace command works. The piece i can't figure out is how to check if a condition exists on the 1st line only and if it does...then execute the command on the entire file in a one-liner.

        Something like this?

        perl -pi -0e'm/\A[^\n]*\r\n/ and s/\r\n/\n/g' file

        Enjoy, Have FUN! H.Merijn
Re: One Liner to strip crlf
by Anonymous Monk on Sep 04, 2014 at 20:09 UTC

    Monks,

    Doesn't this paragraph from perlop apply, or am I missing something?

    All systems use the virtual "\n" to represent a line terminator, called a "newline". There is no such thing as an unvarying, physical newline character. It is only an illusion that the operating system, device drivers, C libraries, and Perl all conspire to preserve. Not all systems read "\r" as ASCII CR and "\n" as ASCII LF. For example, on the ancient Macs (pre-MacOS X) of yesteryear, these used to be reversed, and on systems without line terminator, printing "\n" might emit no actual data. In general, use "\n" when you mean a "newline" for your system, but use the literal ASCII when you need an exact character. For example, most networking protocols expect and prefer a CR+LF ("\015\012" or "\cM\cJ") for line terminators, and although they often accept just "\012", they seldom tolerate just "\015". If you get in the habit of using "\n" for networking, you may be burned some day.

    Also don't the files need to be opened with binmode / :raw to make sure that any possible :crlf is off?

Re: One Liner to strip crlf
by CountZero (Bishop) on Sep 04, 2014 at 15:32 UTC
    How do your lines end? With a single "CR" or with "CRLF"?

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: One Liner to strip crlf
by NetWallah (Canon) on Sep 04, 2014 at 15:42 UTC
    Try this (untested , but you should get the idea):
    perl -i -pe 'last unless s/\r//g' <filename>
    Not exactly your spec, but it's moral equivalent.

            "You're only given one little spark of madness. You mustn't lose it."         - Robin Williams

      That empties the file if the first line does not have a carriage return.

      1 Peter 4:10