in reply to Why does my Perl regex substitution for linebreak fail?

... OK when I print the result to the console but not when I redirect the output into a file ...

Could your file actually have \r\n (windows-based) line-endings? Could get different terminal behavior if running cygwin with unix line-endings?

If you have a unix-like system available, might try pushing a small bit of your processed and unprocessed file through "od". I sometimes use something like "tail -3 foo | od -bc" to keep from getting fooled by "friendly" systems.

Replies are listed 'Best First'.
Re^2: Why does my Perl regex substitution for linebreak fail?
by quester (Vicar) on Mar 06, 2008 at 07:34 UTC
    For mostly-printable files the output of "tail -3 foo | cat -A" is less cluttered.
Re^2: Why does my Perl regex substitution for linebreak fail?
by pat_mc (Pilgrim) on Mar 06, 2008 at 08:58 UTC
    Thanks, igelkott, for adressing the console-part of my post. Can you please explain to me in more basic terms what your suggestion is? I am fairly new to Linux and hence don't quite understand what the issue is you are pointing at. The file I intend to operate on, however, has been generated with the 'cat' command in the shell concatenating other files generated under Linux. Not sure, therefore, if the inter-operating-system issue applies here. Thanks again - Pat
      "tail -3 foo | od -bc" means to take the last three lines from "foo" and feed it to the "od" command with "b" and "c" options.

      I'll presume that the first part is either clear or is reasonably easy to look up; "od" is the weird part. This is named for "octal dump" (option b) but I'm using it here to also get the character names (option c). In particular, to reveal the non-printing characters.

        Got it, igelkott.

        Thanks for the Perl and Bash wisdow!
Re^2: Why does my Perl regex substitution for linebreak fail?
by pat_mc (Pilgrim) on Mar 06, 2008 at 15:48 UTC
    igelkott -

    Your answer got right to the core of the issue. I searched for \r and got matches in exactly those lines which resisted the replacement. What exactly is this \r character, anyway? I have no idea how that \r entered my fully Linux-based and Linux-generated file.

    Any thoughts on this?

    Thanks again for shedding some light on this.

    Cheers -

    Pat
      Line-endings: \r and \n (CR and LF)
      •  \n -> unix
      •  \r -> mac
      • \r\n -> pc

      Exactly how pc line-ending got in your file, I couldn't say but I would guess that the data has passed through a windows machine at sometime. Some file transfer methods take care of line-endings and others don't.

        Good stuff, igelkott.

        Thanks a million for your help! I'll spend some more time wondering just how the \r got in there.

        Just out of curiosity - could it have crept in due to any of my Linux or text editor settings - or maybe even character encodings? I checked for the obvious (to me) editor settings in KWrite but could not spot any pointers to Windows formats there.
        Also, the file did not get routed through the network at any point in time ... oh, wait a minute: I did send a copy of it via uuencode. Would that have done anything to the original file? Probably not ...

        Cheers -

        Pat